Reagents and methods for diagnosis of attention deficit hyperactivity disorder

ABSTRACT

The subject invention provides reagents and methods for diagnosing Attention Deficit Hyperactivity Disorder (ADHD) in an individual based on the presence of specific alleles of the DRD4 gene or other markers found within a region of strong linkage disequilibrium to the DRD4 7R allele in the DNA of the individual.

Attention Deficit Hyperactivity Disorder (ADHD) is a neurobehavioraldisorder defined by symptoms of developmentally inappropriateinattention, impulsivity, and hyperactivity with early onset (AmericanPsychiatric Association DSM-IV: Diagnostic and Statistical Manual ofMental Disorders (Am. Psychiatr. Assoc. Washington, D.C.) 4th Ed. 1994.)Current estimates indicate that 3-6% of school age children arediagnosed with ADHD, making it the most prevalent disorder of childhood(Swanson, J M, Flodman, P, Kennedy, J, Spence, M A, Moyzis, R, Schuck,S, et al., Dopamine genes and ADHD. Neuroscience and Behavioral Reviews2000; 24, 21-25.) While the broad DSM-IV phenotype of ADHD almostcertainly has multiple biological etiologies (Swanson, J, Deutsch, C,Cantwell, D, Posner, M, Kennedy, J L, Barr, C L, et al. Genes andAttention-Deficit Hyperactivity Disorder. Clinical Neuroscience Research2001; 1, 207-216), numerous family, twin and adoption studies havedocumented a strong genetic basis (Faraone, S V, Biederman, J. Geneticsof attention-deficit hyperactivity disorder. Child Adolesc Clin North Am1994; 3, 285-291; Faraone, S V, Doyle, A E, Mick, E, and Biederman, J.Meta-analysis of the association between the 7-repeat allele of thedopamine D4 receptor gene and attention deficit hyperactivity disorder.Am J Psychiatry 2001; 158, 1052-1057).

Despite the high heritability of ADHD, initial genome scan-studies havefailed to identify genes of major effect (Fisher, S E, Franks, C,McCracken, J T, McGough, J J, Marlov, A J, MacPhie, I L, et al. Agenomewide scan for loci involved in Attention-Deficit/HyperactivityDisorder. Am J Hum Genet 2002; 70, 1183-1196), although a region onchromosome 16p13 has been implicated in subsequent studies by the samegroup (Smalley, S L, Kustanovich, V, Minassian, S L, Stone, J L, Ogdie,M N, McGough, J J, et al. Genetic linkage ofattention-deficit/hyperactivity disorder on chromosome 16p13, in aregion implicated in autism. Am J Hum Genet 2002; 71, 959-963.) Suchnegative results are not unexpected for a complex genetic disorder likeADHD, where phenotypic heterogeneity is likely, and the practical but(to date) restricted sample sizes limit statistical power (Risch, N andMerikangas, K. The future of genetic studies of complex human diseases.Science 1996; 273, 1516-1517; Terwilliger, J D and Weiss, K M. Linkagedisequilibrium mapping of complex disease: fantasy or reality? CurrentOpin Biotechnology 1998; 9, 578-594; Weiss, K M and Terwilliger, J D.How many diseases does it take to map a gene with SNPs? Nature Genetics2000; 26, 151-157; Zwick, M E, Cutler, D J, and Chakravarti, A. Patternsof genetic variation in mendelian and complex traits. Ann Rev GenomicsHum Genet 2000; 1, 387-407; Sklar, P. Linkage analysis in psychiatricdisorders: the emerging picture. Ann Rev Genomics Hum Genet 2002; 3,371-413.) Candidate gene studies, on the other hand, require muchsmaller sample sizes to achieve the same statistical power. The efficacyof a dopamine agonist drug, methylphenidate, in the treatment of ADHDhas suggested that genes in the dopamine pathway may be involved in thedisorder's etiology (Volkow, N D, Wang, G J, Fowler, J S, Logan, J,Franceschi, D, Maynard, L, et al. Relationship between blockade ofdopamine transporters by oral methylphenidate and the increases inextracellular dopamine: therapeutic implications. Synapse 2002; 43,181-187). This dopamine hypothesis of ADHD suggests a number ofcandidate genes that could logically be tested for their associationwith the disorder. The draft human genome sequence (International HumanGenome Sequencing Consortium, Initial sequencing and analysis of thehuman genome. Nature 2001; 409, 860-921; Riethman, H C, Xiang, Z, Paul,S, Morse, E, Hu, X-L, Flint, J, Chi, H-C, Grady, D L, and Moyzis, R K.Integration of telomeric sequences with the draft human genome sequence.Nature 2001; 409, 948-951; Cowan, W M, Kopnisky, K L, and Hyman, S E.The Human Genome Project and its impact on Psychiatry. Annu Rev Neurosci2002; 25, 1-50) has provided information sufficient to examine multiplecandidate genes in parallel, often representing most of the proteins ina relevant biochemical pathway.

One of these candidate genes, DRD4 (Van Tol, H H M, Bunzow, J R, Guan,H-C, Sunahara, R K, Seeman, P, Niznik, H B and Civelli, O. Cloning ofthe gene for a human dopamine D4 receptor with high affinity for theantipsychotic clozapine. Nature 1991; 350, 610-614), located near thetelomere of chromosome 11p, is one of the most variable human genesknown (Lichter, J B, Barr, C L, Kennedy, J L, VanTol, H H M, Kidd, K K,and Livak, K J. A hypervariable segment in the human dopamine receptorD4 (DRD4) gene. Human Molecular Genetics 1993; 2, 767-773; Chang, F-M,Kidd, J R, Livak, K J, Pakstis, A J, and Kidd, K K. The world-widedistribution of allele frequencies at the human dopamine D4 receptorlocus. Hum Genetics 1996; 98, 91-101; Ding, Y-C, Wooding, S, Harpending,H C, Chi, H-C, Li, H-P, Fu, Y-X et al. Population structure and historyin East Asia. Proc Natl Acad Sci USA 2000; 97, 14003-14006; Ding, Y-C,Chi, H C, Grady, D L, Morishima, A, Kidd, J R, Kidd, K K et al. Evidenceof positive selection acting at the human dopamine receptor D4 genelocus. Proc Natl Acad Sci USA 2002; 99, 309-314.)

What is needed are genetic marker(s) useful in the diagnosis of ADHD,and methods for using the same.

SUMMARY OF THE INVENTION

The present invention provides a reagent useful for diagnosing attentiondeficit hyperactivity disorder (ADHD), comprising a polynucleotidecorresponding to an allele of DRDR associated with individualsexhibiting ADHD.

The present invention further provides a reagent useful for diagnosingADHD, comprising a polynucleotide corresponding to the DRD4 7R allele.

The present invention further provides a reagent useful for diagnosingADHD, comprising a polynucleotide corresponding to a marker the locus ofwhich is within a block of linkage disequilibrium surrounding the DRD47R allele.

The present invention further provides a reagent useful for diagnosingADHD, comprising a pair of oligonucleotides corresponding to an alleleof DRDR associated with individuals exhibiting ADHD.

The present invention further provides a reagent useful for diagnosingADHD, comprising a pair of oligonucleotides corresponding to the DRD4 7Rallele.

The present invention further provides a reagent useful for diagnosingADHD, comprising a pair of oligonucleotides corresponding to a markerthe locus of which is within a block of linkage disequilibriumsurrounding the DRD4 7R allele.

The present invention further provides a method for diagnosing ADHD inan individual, comprising the steps of:

a) obtaining a tissue sample from the individual;

b) treating the sample so as to expose DNA present in the sample;

c) contacting the exposed DNA with a labeled DNA oligomer underconditions permitting hybridization of the DNA oligomer to any DNAcomplementary to the DNA oligomer present in the sample, the DNAcomplementary to the DNA oligomer containing the DRD4 7R allele;

d) removing unhybridized, labeled DNA oligomer; and

e) detecting the presence of any hybrid of the labeled DNA oligomer andDNA complementary to the DNA oligomer present in the sample, therebydetecting and diagnosing ADHD.

The present invention further provides a method for diagnosing ADHD inan individual, comprising the steps of:

a) obtaining a tissue sample from the individual;

b) treating the sample so as to expose DNA present in the sample;

c) contacting the exposed DNA with a labeled DNA oligomer underconditions permitting hybridization of the DNA oligomer to any DNAcomplementary to the DNA oligomer present in the sample, the DNAcomplementary to the DNA oligomer containing a marker within a region ofstrong linkage disequilibrium to the DRD4 7R allele;

d) removing unhybridized, labeled DNA oligomer; and

e) detecting the presence of any hybrid of the labeled DNA oligomer andDNA complementary to the DNA oligomer present in the sample, therebydetecting and diagnosing ADHD.

The present invention further provides a method for diagnosing ADHD inan individual, comprising the steps of:

a) obtaining a tissue sample from the individual;

b) providing an oligonucleotide complementary to the sense strand of theDRD4 gene;

c) providing an oligonucleotide complementary to the antisense strand ofthe DRD4 gene;

d) treating the sample so as to expose DNA present in the sample;

e) contacting the exposed DNA with the oligonucleotides under conditionspermitting amplification of the DRD4 gene;

f) sequencing the product of the amplification; and

g) detecting the presence of the DRD4 7R allele in the sample, therebydetecting and diagnosing ADHD.

The present invention further provides a method for diagnosing ADHD inan individual, comprising the steps of:

a) obtaining a tissue sample from the individual;

b) providing an oligonucleotide complementary to the sense strand of amarker sequence found in an area of strong linkage disequilibrium withthe DRD4 7R allele;

c) providing an oligonucleotide complementary to the antisense strand ofthe marker sequence;

d) treating the sample so as to expose DNA present in the sample;

e) contacting the exposed DNA with the oligonucleotides under conditionspermitting amplification of the marker sequence;

f) sequencing the product of the amplification; and

g) detecting the presence of the marker sequence in the sample, therebydetecting and diagnosing ADHD.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 Diagrammatic representation of the human DRD4 gene region. Exonpositions are indicated by blocks (yellow: noncoding, orange: coding).The approximate positions of a 120 bp promoter region duplication (bluetriangle), an exon 1 12 bp duplication (blue triangle), an exon 3 48 bpVNTR (blue triangle), and two intron 3 SNPs are indicated. 2R through11R variants of the 48 bp VNTR are indicated below exon 3 (blue), alongwith their worldwide population frequencies determined by PCR analysis(Chang, F.-M., Kidd, J. R., Livak, K. J., Pakstis, A. J., and Kidd, K.K. (1996) Hum. Genet. 98, 91-101; Ding, Y.-C., Wooding, S., Harpending,H. C., Chi, H.-C., Li, H.-P., Fu, Y.-X., Pang, J.-F., Yao, Y.-G., Yu,J.-G. X., Moyzis, R., and Zhang, Y.-P. (2000) Proc. Natl. Acad. Sci.USA. 97, 14003-14006).

FIG. 2 Nucleotide and amino acid sequences of VNTR motifs. Thenucleotide (SEQ. ID NO. 1) and corresponding amino acid (red) sequencesof 35 DRD4 exon 3 48 bp repeat motifs are shown. Prior nomenclature(Lichter, J. B., Barr, C. L., Kennedy, J. L., Van Tol, H. H. M., Kidd,K. K. and Livak, K. J. (1993) Human Molecular Genetics 2, 767-773) for19 of these motifs are indicated (α through ξ). The putative single steporigin of most of these motifs is indicated, either as a recombinationevent (R) or a mutation event (M). For example, the seven motif ishypothesized to be a recombination between a 2 motif and a 3 motif (R2/3) and the 8 motif is hypothesized to be a single point mutation of a2 motif (M 2). Motifs 1 though 6, which account for the vast majority ofobserved haplotype variants (Table 1), are considered the progenitors.Motifs with no putative origin noted (for example, motif 15), havemultiple possible progenitors.

FIG. 3 Proposed origin of DRD4 diversity. A simplified model for exon 348 bp repeat sequence diversity is shown, with only major recombinationevents indicated (FIG. 2). The major 2R, 4R, and 7R-alleles are shown inyellow, and the minor 3R, 5R, and 6R-alleles in gray, along with theirhypothesized origins by unequal recombination (red arrows). Large redarrows indicate the putative multistep origin of the 7R-allele. Adjacentpromoter region (L₁/S₁), exon 1 (L₂ /S₂), and intron 3 (G-G/A-C)polymorphisms are indicated. The strong linkage of the L₁, L₂ and A-Cpolymorphisms with the DRD4 7R-allele is noted.

FIG. 4. Proposed origin of ADHD/DRD4 allele diversity. A model for DRD4exon 3 repeat sequence diversity is shown. The ancestral 4R(1-2-3-4) and7R(1-2-6-5-2-5-4) alleles are noted in yellow, with large red arrowsindicating the multistep origin of the 7R-allele. The proposedmutational or recombinational origins of the 12 novel alleles reportedin this study are indicated along the blue arrows. Amino acid changesare also indicated. Haplotype nomenclature as described in FIG. 2.

FIG. 5. A simplified diagram of complex genetic disorders. The leftcolored circles represent the potentially overlapping phenotypesclassified together as a single disorder. In the current study, therefined phenotype of ADHD without comorbidity is proposed to representone of the circles. The Gene 1—Gene N displayed along the DNA moleculeindicates our inability to estimate the number of genes associated withthe disorder. Likewise, the double-headed arrows represent our inabilityto predict how these genes interact to produce the phenotype(s) depictedat left. Some fraction of the disorder may have a nongenetic cause(arbitrarily represented as 0.2 nongenetic in the diagram), for examplebrain damage in the case of ADHD (Swanson, J M, Oosterlaan, J, Murias,M, Schuck, S, Flodman, P, Spence, M A, et al. Attentiondeficit/hyperactivity disorder children with a 7-repeat allele of thedopamine receptor D4 gene have extreme behavior but normal performanceon critical neuropsychological tests of attention. Proc Natl Acad SciUSA 2000; 97, 4754-4759.) The Genes 1—Gene N account for some fractionof the disorder (arbitrarily represented as 0.2 each in the diagram).Two widely discussed models for how genetic variants predispose tocommon disorders are shown, the Common Variant-Common Disorder (CVCD)hypothesis, and the Allelic Heterogeneity or Rare Variant-CommonDisorder (RVCD) hypothesis.

FIG. 6. Contrast between rare single gene disorders and common complexgenetic disorders. For single gene disorders, for example HuntingtonDisease (Huntington's Disease Collaborative Research Group. A novel genecontaining a trinucleotide repeat that is unstable on Huntington'sDisease chromosomes. Cell 1993; 72, 971-983)(left), predisposing alleles(indicated by a=0.0001) and the disease frequency (indicated bya/x=0.0002) are rare. Therefore, one observes a dramatic increase inallele frequency (and relative risk) in probands. For complex disordersrelated to common alleles, however, only modest increases in allelefrequency (and relative risk) are expected. In the example shown(right), three predisposing alleles (DRD4 7R,b,c) in three differentgenes are hypothesized to interact. Each allele is proposed to be atpolymorphic frequency in the population (0.05-0.12). Individuals withpredisposing genotypes [(DRD4 7R/x)(b/x), (DRD4 7R/x)(c/x), (b/x)(c/x)]represent 0.05 of the population, the approximate frequency of ADHD(Faraone, S V, Doyle, A E, Mick, E, and Biederman, J. Meta-analysis ofthe association between the 7-repeat allele of the dopamine D4 receptorgene and attention deficit hyperactivity disorder. Am J Psychiatry 2001;158, 1052-1057). The observed increase in alleles DRD4 7R,b, and c inprobands ranges from 4-fold (if all cases are caused by these genes) to2-fold (if only 50% of cases are caused by these genes). For example, asignificant fraction of ADHD may have nongenetic causes, yet these caseswill be included in our proband population (FIG. 5).

FIG. 7. Polymorphism distribution at the DRD4 locus. Seventy DRD4polymorphisms are displayed using VG (Nickerson, D. A., Taylor, S. L.,Weiss, K. M., Clark, A. G., Hutchinson, R. G., Steingard, J., et al.(1998) Nature Genet 19, 233-240), with individual variants aligned alongthe horizontal axis. Approximate locations of the variants along theDRD4 loci (GenBank AC021663) are indicated by blue lines reaching to thediagrammatic representation of the gene (above). In this representation,exon positions are represented by blocks (yellow, noncoding, orange,coding; +1=translation start), and the positions of Alu repetitivesequences by pointed blue blocks. The position of a 120 bp upstreamduplication and the exon 3 48 bp VNTR are indicated by green triangles.A 288 bp site (−809 To −521) at the promoter region that contains ananomalously high number of SNPs is indicated. These SNPs exhibit little4R versus 7R frequency difference. Individuals (vertical axis) aregrouped by VNTR length (4R/4R, 7R/7R, and 2R/2R) and geographic origin(African, European, etc.) as indicated. Homozygotes for the allele withthe highest relative frequency (common allele) are indicated by bluesquares, homozygotes for alternative (rare) alleles by yellow squares,and heterozygotes by red squares. The 7R/7R and 2R/2R individuals weregreatly oversampled in comparison to their population frequency, andhence common and rare alleles were defined by the frequency in arandomly sampled population.

FIG. 8. Pairwise linkage disequilibrium (D′) at the DRD4 locus. Theprogram GOLD (Abecasis, G. R., and Cookson, W. O. (2000) Bioinformatics16, 182-183) was used to generate and display all pairwise values of LDfor 31 DRD4 polymorphisms with minor alleles >0.01. Separatecalculations were performed on 4R/4R (top) and 7R/7R (bottom)populations. The color scale indicated grades LD values from 0.00 (blue)to 1.00 (red). At the short distances used in this study (<6 kb), LDvalues of approximately 0.6 are expected by chance (Kruglyak, L. (1999)Nature Genet 22, 139-144).

FIG. 9. SNP recombination fraction for DRD4 7R alleles. The observedpercent recombination at the 18 SNPs from Table 5 is plotted versusdistance from the 7R VNTR. The curve is an empirically determined leastsquared fit to the data. The diagrammatic representation of the DRD4locus is as described in FIG. 7.

FIG. 10. A diagrammatic model for DRD4 variant selection. DRD4 2R,4R and7R protein variants are shown aligned along a scale of relativeefficiency for Camp reduction (normalized to 4R=1.0), calculated fromthe data of Asghari et al (Asghari, V., Sanyal, S., Buchwaldt, S.,Paterson, A., Jovanovic, V., and Von Tol, H. H. M. (1995) J. Neurochem.65, 1157-1165). The diagrammatic protein models were constructed usingthe rhodopsin crystal structure as a framework. The unusual derivationof the 7R allele from the ancestral 4R allele (approximately 42,500years ago), and it's increase in prevalence is indicated by a red toblue arrows. The subsequent derivation of the 2R allele from a 7R/4Rrecombination is indicated by multiple yellow arrows.

DETAILED DESCRIPTION OF THE INVENTION

All publications mentioned herein are incorporated herein by referencein their entireties. The publications discussed above, below andthroughout the text are provided solely for their disclosure prior tothe filing date of the present application. Nothing herein is to beconstrued as an admission that the inventor is not entitled to antedatesuch disclosure by virtue of prior invention.

As discussed below in more detail, the present inventors have shown thata strong linkage disequilibrium (LD) exists between the 7R-allele ofDRD4, disproportionately represented in individuals diagnosed with ADHD,and surrounding DRD4 polymorphisms. Markers within this large LD blockthus are useful in the diagnosis of ADHD. It should be noted that due tothe strong LD discovered by the present inventors, any marker withinthis region is potentially useful for diagnosing ADHD, as will beappreciated by one of skill in the art. Such new markers may beidentified by techniques well known in the art. Accordingly, thediagnostic reagents of the present invention are not limited to specificDRD4 polymorphisms, but also include other markers now known orsubsequently identified in the block of LD surrounding the 7R-allele ofDRD4.

Evidence of Positive Selection Acting at the Human Dopamine Receptor D4Gene Locus

Associations have been reported of the 7-repeat (7R) allele of the humandopamine receptor D4 (DRD4) gene with both attentiondeficit/hyperactivity disorder (ADHD) and the personality trait ofnovelty seeking. This polymorphism occurs in a 48 bp tandem repeat(VNTR) in the coding region of DRD4, with the most common allelecontaining four repeats (4R), and rarer variants containing two toeleven. Here, we show by DNA resequencing/haplotyping of 600 DRD4alleles, representing a worldwide population sample, that the origin of2R- through 6R-alleles can be explained by simple one-steprecombination/mutation events. In contrast, the 7R-allele is not simplyrelated to the other common alleles, differing by greater than 6recombinations/mutations. Strong linkage disequilibrium (LD) was foundbetween the 7R-allele and surrounding DRD4 polymorphisms, suggestingthis allele is at least 5-10 fold “younger” than the common 4R-allele.Based on an observed bias towards nonsynonymous amino acid changes, theunusual DNA sequence organization, and the strong LD surrounding theDRD4 7R-allele, we propose that this allele originated as a raremutational event that nevertheless increased to high frequency in humanpopulations by positive selection.

The human DRD4 gene (Van Tol, H. H. M., Bunzow, J. R., Guan, H.-C.,Sunahara, R. K., Seeman, P., Niznik, H. B. and Civelli, O. (1991) Nature350, 610-614), located near the telomere of chromosome 11p, is one ofthe most variable human genes known. Most of this diversity is theresult of length and single nucleotide polymorphism (cSNP) variation ina 48 bp tandem repeat (VNTR) in exon 3, encoding the third intracellularloop of this dopamine receptor. Variant alleles containing two (2R) toeleven (11R) repeats are found, with the resulting proteins having 32 to176 amino acids at this position. Interestingly, the frequency of thesealleles varies widely. The 7R-allele, for example, has an extremely lowincidence in Asian populations, yet a high frequency in the Americas.

A number of investigations have found associations between particularalleles of this highly variable gene and behavioral phenotypes (LaHoste, G. J., Swanson, J. M., Wigal, S. B., Glabe, C., Wigal, T., King,N., and Kennedy, J. L. (1996) Molecular Psychiatry 1, 21-24; Swanson, J.M., Flodman, P, Kennedy, J, Spence, M. A., Moyzis, R, Schuck, S, Murias,M, Moriarity, J, Barr, C, Smith, M, et al., (2000) Neuroscience andBehavioral Reviews 24, 21-25; Swanson, J. M., et al. (2000) Proc. Natl.Acad. Sci. USA 97, 4754-4759; Ebstein, R. P., Novick, O., Umansky, R.,Priel, B., Osher, Y., Blaine, D., Bennett, E. R., Nemanov, L., Katz, M.,and Belmaker, R. H. (1996) Nature Genetics 12, 78-80; Benjamin, J., Li,L., Patterson, C., Greenberg, B. D., Murphy, D. L. and Hamer, D. H.(1996) Nature Genetics 12, 81-84). While initial studies suggested thatthe 7R-allele of the DRD4 gene might be associated with the personalitytrait of novelty seeking, the most reproduced association is between the7R-allele and attention deficit/hyperactivity disorder (ADHD) (Swanson,J., Deutsch, C., Cantwell, D., Posner, M., Kennedy, J., Barr, C.,Moyzis, R., Schuck, S., Flodman, P., and Spence, M. A. (2001) ClinicalNeuroscience Research 1, 207-216). ADHD is the most prevalent disorderof early childhood, affecting an estimated 3% of elementary schoolchildren. As defined by DSM-IV criteria (Am. Psychiatr. Assoc. (1994)DSM-IV: Diagnostic and Statistical Manual of Mental Disorder (ForthEdition) (Am. Psychiatr. Assoc., Washington, D.C.)), ADHD consists ofdevelopmentally inappropriate inattention, impulsivity and hyperactivitywith early onset (before the age of 7). Evidence of a strong geneticcomponent of ADHD has come from a variety of twin, adoption, and familystudies (Faraone, S. V. and Biederman, J. (1994) Child Adolesc. Clin.North Am. 3, 285-291; Thaper, A., Holmes, J., Poulton, K., andHarrington, R. (1999) Br. J. Psychiatry 174, 105-111). The efficacy ofmethylphenidate in the treatment of ADHD indicated that genes in thedopamine pathway might play a role in the syndrome's etiology (Volkow,N. D., Wang G. J., Fowler, J. S., Fischman, M., Foltin, R., Abumrad, N.N., Gatley, S. J., Logan, J, Wong, C, Gifford, A, et al., (1999) LifeSci. 65, 7-12). Initial association studies found ADHD probands toexhibit an increased frequency of DRD4 7R-alleles in comparison tocontrols. Eight separate replications of this initial observation havenow been reported. As in all association studies, however, one can notassume that the presence of a DRD4 7R-allele is either necessary orsufficient to “cause” ADHD. Further work will be required to understandthe genetic/environmental factors underlying this behavior.

Nevertheless, given the likely functional importance of this change inthe DRD4 protein, in a region that couples to G-proteins and mediatespost-synaptic effects (Asghari, V., Sanyal, S., Buchwaldt, S., Paterson,A., Jovanovic, V., and Von Tol, H. H. M. (1995) J. Neurochem. 65,1157-1165), these association studies have generated considerableinterest. In particular, this association is consistent with the commonvariant-common disorder (CVCD) hypothesis, which proposes that the highfrequency of many complex genetic diseases is related to common DNAvariants (Collins, F. S., Guyer, M. S., and Chakravarti, A. (1997)Science 278, 1580-1581; Zwick, M. E., Cutler, D. J., and Chakravarti, A.(2000) Ann. Rev. Genomics Hum. Genet. 1, 387-407). However, manyquestions remain as to the nature of the DRD4 /ADHD association. Onewould like to know 1) if particular 7R-allele variants are associatedwith ADHD, 2) the population distribution of variant DRD4 alleles,and/or 3) whether the observed marker is in linkage disequilibrium (LID)with other etiologically relevant polymorphisms. Given the known highlevel of sequence polymorphism of this gene, PCR-based DNA resequencingis the most efficient and accurate method to address these questions.Here, we use this approach to determine A) the population distributionof DRD4 exon 3 haplotypes and B) their relative association withadjacent polymorphisms. We present haplotype data indicating that theDRD4 7R-allele originated as a rare mutational event (or events), thatnevertheless increased to high frequency in human populations bypositive selection.

Methods

Population Samples. Samples were obtained as reported previously (Chang,F.-M., Kidd, J. R., Livak, K. J., Pakstis, A. J., and Kidd, K. K. (1996)Hum. Genet. 98, 91-101; Ding, Y.-C., Wooding, S., Harpending, H. C.,Chi, H.-C., Li, H.-P., Fu, Y.-X., Pang, J.-F., Yao, Y.-G., Yu, J.-G. X.,Moyzis, R., and Zhang, Y.-P. (2000) Proc. Natl. Acad. Sci. USA. 97,14003-14006). The origins of the 600 alleles reported in this study,based on geographical/ethnic origin, are as follows: North and SouthAmerica, 12.7% (76 alleles), Europe, 36.7% (220 alleles), Asia, 27.3%(164 alleles), Africa, 20.3% (122 alleles), and Pacific, 3.0% (18alleles). Lymphoblastoid cell lines have been established for most ofthese population samples, and methods for transformation, cell culture,and DNA purification described. For LD studies of the DRD4 4R-G-G SNPassociation, an additional 288 alleles (approximately equally derivedfrom African, Asian and European sources) were used. All persons gavetheir informed consent prior to their inclusion in this study, carriedout under protocols approved by the Human Subjects Committees at theparticipating institutions.

PCR Amplification and DNA sequencing. PCR amplification of the DRD4promoter polymorphism was conducted as described (Seaman, M. I., Fisher,J. B., Chang, F.-M., and Kidd, K. D. (1999) Am. J. Med. Genet. 88,705-709; McCracken, J. T., Smalley, S. L., McGough, J. J., Crawford, L.,Del'Homme, M., Cantor, R. M., Liu, A., and Nelson, S. F. (2000) Mol.Psychiatry 5, 531-536). The program OLIGO 6.0 was used to select primerpairs for the exon 1 polymorphism (Catalano, M., Nobile, M., Novelli,E., Nothen, M. M., and Smeraldi, E. (1993) Biol. Psychiat. 34, 459-464)(5′-TGGGCCGCCGCATTCGT-3′ (SEQ. ID NO. 2) and 5′-GGTGGGTGTATCGCCGAGGGA-3′(SEQ. ID. NO. 3); 661-nucleotide product) and the exon 3 VNTR(5′-CGTACTGTGCGGCCTCAACGA-3′ (SEQ. ID NO. 4) and5′-GACACAGCGCCTGCGTGATGT-3′ (SEQ. ID NO. 5); 705 nucleotide product forthe 4R-allele). For some amplifications of the VNTR, primers describedpreviously were used (Lichter, J. B., Barr, C. L., Kennedy, J. L., VanTol, H. H. M., Kidd, K. K. and Livak, K. J. (1993) Human MolecularGenetics 2, 767-773). The alternative primers were chosen farther fromthe VNTR, to minimize out-of-register hybridization duringamplification. PCR reactions were conducted in 25 microliter volumes,containing 100ng genomic DNA, 200 micromolar dXTPs, 0.5 micromole ofeach primer, 1X PCR buffer (Qiagen), 1X Q-solution (Qiagen) and 0.625units Taq DNA polymerase (Qiagen). Amplification was performed usingPerkin-Elmer 9700 thermal cyclers. A 20 second, 96-degrees C. hot startwas used, followed by 40 cycles of 95 degrees C. for 20 seconds and 68degrees C. for 1 minute. Following a 4-minute chase at 72-degrees C.,excess primers were eliminated with 0.5 units of Shrimp AlkalinePhosphatase (SAP, Amersham Life Science), 0.1 unit of Exonuclease I (ExoI, Amersham Life Science) and 1X SAP buffer (Amersham Life Science). TheSAP/Exo I reaction was carried out at 37 degrees C. for 1 hour, followedby a 15-minute heat inactivation at 72-degrees C. The DNA from theSAP/Exo I reaction was used directly for DNA sequencing. For mostindividuals, the two allelic PCR products were first separated on 1.2-%agarose gels. DNA cycle sequencing was conducted by standard techniques,using ABI 377 and 3700 automated sequencers (Riethman, H. C., Xiang, Z.,Paul, S., Morse, E., Hu, X.-L., Flint, J., Chi, H.-C., Grady, D. L., andMoyzis, R. K. (2001) Nature 409, 948-951). DNA sequences of the DRD4haplotypes reported herein have been submitted to GenBank (Accessionnumbers AF395210 through AF395264).

K_(a)/K_(s) and Allele age calculations. K_(a)/K_(s) ratios werecalculated by standard methods (Kimura, M. (1968) Nature 217, 624-626;Kreitman, M. (2000) Ann. Rev. Genomics Hum. Genet. 1, 539-559). Putativerecombinant haplotypes were not considered independent events. Alleleage calculations were conducted by standard methods (Harpending, H. andRogers, A. (2000) Annu. Rev. Genomics Hum. Genet. 1, 361-385; Kimura, M.and Ohta, T. (1973) Genetics 75, 199-212; Slatkin, M. and Rannala, B.(2000) Ann. Rev. Genomics Hum. Genet. 1, 225-249; Serre, J. L.,Simon-Bouy, B., Mornet, E., Jaume-Roig, B., Balassopoulou, A., Schwartz,M, Taillandier A, Boue J, Boue A., (1990) Hum. Genet. 84, 449-454).Briefly:

1) Calculated from Population Frequency.

E (t₁)=[−2p/(1−p)] In (p), where E(t₁)=expected age, time is measured inunits of 2N generations, and p=population frequency. For DRD4, p=19.2%for the 7R-allele and 65.1% for the 4R-allele. A generation time of20-25 years and N=10,000 were assumed (regarded as a minimum estimate ofthe effective population size of modern humans during the period priorto recent growth).

2) Calculated from Intra-Allelic Variation.

t=[1/ln(1−c)] ln [(x(t)−y)/(1−y)], where t=allele age, c=recombinationrate, x(t)=frequency in generation t, and y=frequency on normalchromosomes. Assuming the origin of the 7R-allele was on a L₁L₂(7R)A-Chaplotype, for the (7R)A-C association c=0.0000136 (from the averagerecombination rate per Mb times the VNTR-SNP distance), x(t)=97% (thepercent of A-C SNPs associated with DRD4 7R-alleles), and y=13.9% (thepercent of A-C SNPs associated with African DRD4 4R-alleles, assumed tobe the “normal” allele). For the promoter polymorphism L₁(7R)association, c=0.000165, x(t)=90.8%, and y=61.9%.

Results and Discussion

Primer sets were chosen to amplify the four exons of the highly GC-richDRD4 gene, as well as the adjacent promoter region and splice junctions(FIG. 1). Initial resequencing of the entire promoter and coding regionof the DRD4 gene from 20 ADHD probands (data not shown) uncovered anumber of polymorphisms reported previously. These polymorphismsincluded two insertion/deletion polymorphisms, one in the promoterregion (4.3 kb upstream of the VNTR) and one in exon 1 (2.7 kb upstreamof the VNTR; see FIG. 1). In addition, a number of new coding SNPs wereuncovered in the exon 3 48 bp VNTR, as well as two previously unreportedSNPs in intron 3, 20 nucleotides apart and approximately 350 bpdownstream from the center of the VNTR (FIG. 1). Given the high level ofVNTR polymorphism identified in this initial sample, a more extensivePCR-resequencing of 600 exon 3 VNTR alleles was conducted, obtained froma worldwide population sample (Table 1 and FIG. 2). This samplecontained individuals representing most major geographical origins (seeMethods). The majority of individuals were heterozygotes, and the twoallelic PCR products could be separated by gel electrophoresis prior tosequencing, providing unambiguous haplotypes. Altogether, we screenedover 450,000 bp of genomic DNA and 2,968 48 bp repeats. TABLE 1Haplotypes of 600 DRD4 exon 3 alleles Allele F N Haplotype 2R 0.088 5543 1-4 12 30-4* 3R 0.024 36 16 1-7-4 9 1-2-4 4 1-11-33* 3 1-9-4 1 1-2-221 1-2-21 1 1-2-31 1 1-2-32 4R 0.651 250 238 1-2-3-4 3 1-2-14-4 21-2-13-4 2 1-2-12-4 1 1-17-3-4 1 1-9-12-4 1 1-8-3-4 1 1-10-3-4 1 1-9-3-45R 0.016 27 12 1-3-2-3-4* 4 1-2-13-34-4* 3 1-2-2-3-4 2 1-2-6-5-4 21-11-2-3-4 1 1-3-2-14-4 1 1-2-6-23-4 1 1-2-3-9-4 1 1-2-3-27-4 6R 0.02224 16 1-2-3-2-3-4 2 1-2-6-5-2-20 2 1-2-6-5-2-4 1 1-2-14-17-2-4 11-6-5-2-5-4 1 1-2-13-2-5-19 1 24-6-5-2-5-4 7R 0.192 199 1771-2-6-5-2-5-4 5 1-2-6-5-2-5-19* 3 1-2-6-5-2-3-4 3 1-2-6-5-13-5-4* 21-8-25-5-2-5-4 2 1-2-3-5-2-5-4 1 1-2-6-5-2-13-4 1 1-2-29-17-2-5-4 11-2-6-2-2-5-4 1 1-8-25-5-2-3-4 1 1-2-6-16-2-3-4 1 1-2-6-5-2-14-4 11-2-3-17-2-5-4 8R 0.006 6 2 1-2-6-5-17-2-13-35* 1 1-2-6-5-2-2-5-4 11-2-6-26-5-26-3-35 1 1-2-6-26-5-26-3-4 1 1-2-6-18-5-18-3-4 9R <0.001 11-8-25-5-2-5-2-23-4 10R  <0.001 1 1-2-15-6-2-6-5-2-5-4 11R  <0.001 11-2-3-27-5-23-25-5-2-5-28F, observed allele frequency in 2,836 chromosomes from 37 worldwidehuman populations (3, 17);N, allele number identified by sequence analysis in this study (non-4Ralleles were oversampled by 2-3-fold); haplotype, haplotypes areindicated using the repeat motif nomenclature proposed (FIG. 2). Alleleswith adjacent asterisks indicate common variants found only in a singlepopulation sample (2R 30-4,

; 3R 1-11-33,

; 5R 1-3-2-3-4, Chinese; 5R 1-2-13-34-4,

; 7R 1-2-6-5-2-5-19,

; 7R 1-2-6-5-13-5-4,

; 8R 1-2-6-5-17-2-13-35,

Alleles with a single representation by definition were found in onlyone population

In the 600 chromosomes sequenced, 56 different haplotypes were found(Table 1). These haplotypes were composed of 35 distinct 48-bp variantmotifs (FIG. 2), 19 of which were reported previously (designated Alphathrough Xi in FIG. 2). We propose that these DRD4 48 bp variant motifsare given numbers as shown, rather than the letters used previously,since there are not enough characters in the Greek alphabet. We proposethat DRD4 exon 3 variants be designated in the format shown, i.e., themost common 4R allele being designated 4R(1-2-3-4), etc.

We intentionally over sampled non-4R-alleles approximately two-fold,since little sequence variation was uncovered in the common 4R-allele(Table 1), even though it represents 65 percent of the world populationfrequency. Most of the haplotypes in this sample (85.7%) were found atfrequencies less than 1% (Table 1). Looking at nucleotide diversityamong variants defined by their VNTR number, the common 2R, 4R, and7R-alleles exhibit the least diversity, with 78.2%, 95.2%, and 88.9% ofthe alleles respectively represented by the most common 2R(1-4),4R(1-2-3-4), and 7R(1-2-6-5-2-5-4) haplotypes (Table 1). In contrast,while the 3R, 5R, 6R, and 8R alleles are rarer, they have proportionallymore variants (Table 1). This unusual pattern of allele diversity isclearly not a simple length effect, i.e., longer alleles have greaterdiversity. Many population specific rare haplotypes were observed.Examples include the 2R(30-4) haplotype found only in the Surui (SouthAmerica) sample, and the 5R(1-3-2-3-4) haplotype found only in the HanChinese (Asian) sample (Table land FIG. 2).

The pattern of nucleotide variation observed in the VNTR haplotypes isnot random (FIG. 2). Most DNA sequence variants change the amino acidsequence, sometimes quite dramatically (i.e., Gln to Pro; FIG. 2).Although many of these variants are related mutational events (below),one can account for these relationships in calculating K_(a)/K_(s) (theratio of the number of amino acid replacements per site divided by theestimate of the number of synonymous changes). Values of K_(a)/K_(s)greater than 1 are usually taken to be a stringent indicator of positiveselection at the observed DNA segment. For a tandem repeat sequence,many assumed relationships can be inferred, and hence differentK_(a)/K_(s) ratios calculated. For all assumed relationships of the DRD4variants, however, K_(a)/K_(s)>1. For example, assuming that the mostabundant 1 through 6-variant motifs (FIG. 2) all have a common origin,and that diversity was generated by both mutation and recombination(below), a K_(a)/K_(s) value of 3 is obtained. Expanding this analysisto include between-species divergence (a powerful method to improvethese calculations) is not possible, due to the rapid de novo generationof variation in this VNTR in primate lineages (Livak, K. J., Rogers, J.,and Lichter, J. B. (1995) Proc. Natl. Acad. Sci. USA 92, 427-431).

Standard approaches to defining evolutionary relationships between thesehaplotypes are not applicable, due to the repetitive nature of the DNAsequence. Based on the observed DNA sequences and their nucleotidevariations, however, it is straightforward to propose a simple originfor the majority of these haplotypes (FIG. 3; Table 1). One-steprecombination/mutation events between the most common alleles canaccount for nearly all of the observed variation of the 2R through 6Ralleles. FIG. 3 is a simplified diagram of the most common recombinationevents proposed. While the inferred nucleotide sequence of an ancestralDRD4 cannot be determined, all alleles in a particular primate speciesappear to be derived from a relatively recent common ancestor. The mostprevalent 4R-allele is proposed as the human progenitor allele, basedon 1) limited sequence data reported for primate DRD4 4R-alleles, 2) thelower level of LD for polymorphisms surrounding this allele (asdiscussed below), and 3) the sequence motif arrangements of the non-4Ralleles. Unequal recombination between two 4R(1-2-3-4) alleles wouldproduce the observed common 2R though 6R alleles (FIG. 3). The positionof crossover determines the resulting sequence. For example the mostcommon 3R(1-7-4) and 3R(1-2-4) alleles differ only in the position ofcrossover, either within or after the second repeat (FIG. 3; Table 1).Thus, the known high frequency of unequal recombination between tandemrepeats (Jeffreys, A. J., Neil, D. L., and Neumann, R. (1998) EMBO J.17, 4147-4157) can account for most of the observed diversity of theDRD4 gene.

In addition to unequal crossovers, single point mutations are evident inthis population sample (Table 1 and FIG. 2). For example, with oneexception, all 2R alleles worldwide have the sequence 2R(1-4) (Table 1).All twelve 2R alleles resequenced from Surui (South American) DNA werefound to contain a single point mutation, the 2R(30-4) allele (Table 1and FIG. 2). This mutation, a C to T change in the first repeat, doesnot alter the amino acid sequence, and likely has a recent (less than10,000-20,000 year) origin.

In contrast, the formation of the observed 7R and higher alleles cannotbe explained by simple one-step recombination/mutation events from the4R(1-2-3-4) haplotype (FIG. 3). The generation of a 7R allele from themost prevalent 4R allele would require at least one recombination and 6mutations to arise. Even allowing for more complicated gene conversionevents, multiple low probability steps are needed to convert a 4R alleleinto a 7R allele (FIG. 3). For example, the central 5-variant motiffound in the common 7R(1-2-6-5-2-5-4) haplotype could be produced by arecombination between two 4R-alleles. Recombination between the terminal4-variant motif of one 4R-allele and the initial 1-variant motif of thesecond 4R-allele would yield a 7R(1-2-3-5-2-3-4) haplotype (FIG. 2).Three additional mutations of each of the two three-variant motifs inthis putative 7R-haplotype are then required to produce the current7R(1-2-6-5-2-5-4) haplotype. Four of these six nucleotide changes arenonsynonymous, altering the amino acid sequence (Ser to Gly, Gln to Pro,Ala to Pro, and Ser to Gly; FIG. 2). While gene conversion rather thanmutation could be proposed as the mechanism to “insert” these nucleotidechanges in a hypothetical 7R(1-2-3-5-2-3-4) allele, two unlikely events,one involving 7R-7R allele gene conversion, would be necessary (FIGS. 2and 3).

None of these putative “intermediate” 7R haplotypes were observed inthis worldwide population sample. Our sample included 47 7R-allelessequenced from individuals of African origin, thought to containpopulations with the greatest genetic diversity and age. It is unlikely,then, that “intermediate” 7R haplotypes exist at high frequency. It isnot our intention, however, to propose a specific origin of the DRD47R-allele. Rather, we wish to emphasize that, based on DNA sequenceanalysis, the DRD4 7R-allele appears to be quite distinct from thecommon 2R through 6R alleles. It is impossible to determine if theorigin of the DRD4 7R-allele was a single, highly unlikely event, or aseries of unlikely events (FIG. 3).

Regardless of the mechanism of origin of the DRD4 7R-allele, it isclearly capable of participating in recombination events with the otheralleles. Most of the rare 7R haplotypes observed appear to berecombination events, mostly with the common 4R(1-2-3-4) allele (Table1). For example, the 7R(1-2-6-5-2-3-4) haplotype appears to be arecombination between a 4R(1-2-3-4) allele and a 7R(1-2-6-5-2-5-4)allele (Table 1 and FIG. 2). This origin was confirmed by analyzing SNPsoutside the recombination region (see below). Further, the origin ofsome of the rare 5R and 6R alleles and all of the 8R and higher allelescan be explained by recombinations involving a 7R allele, since theycontain the 6-variant motif, unique to the 7R allele (FIG. 2 and Table1). Many of these 8R and higher alleles, however, appear to have morecomplicated origins, based on DNA sequence analysis (Table 1 and FIG.2).

This model (FIG. 3) explains the apparent anomaly in the observedhaplotype diversity noted above (Table 1), where the most abundant (andancient, see below) 4R-allele has the lowest nucleotide diversity. Ifrecombination is the predominant generator of diversity, then themajority of 4R/4R recombination events are predicted to have unchangednucleotide sequence. Such events can only be inferred by recombinationof outside markers. Only when out-of-register recombination occurs willnew nucleotide sequence (and length) variants be generated (FIG. 3). Theobserved pattern of haplotype diversity is consistent with apredominantly “2-allele” system (4R and 7R), with most of the rarervariants generated by recombination from these two haplotypes (FIG. 3).

The unusual nature of the sequence organization of the DRD4 7R-allele,suggesting it arose as a rare mutational event, led us to determine ifdifferences in LD exist between the 4R and 7R-alleles. The haplotype oftwo adjacent intronic SNPs (G/A-G/C; FIG. 1) could be directlydetermined, since they were present on the same PCR product used toamplify the 48 bpVNTR. Strong LD was found between the A-C SNP pair andthe 7R-allele (FIG. 3). Ninety-seven percent of 7R-alleles wereassociated with the A-C SNP pair (66 out of 68 examined). The two 7Ralleles associated with G-G SNPs were 7R/4R recombinant haplotypes, asdetermined originally from DNA sequence analysis (above). In contrast,both the G-G and A-C SNP pairs are associated with DRD4 4R-alleles (487examined alleles). However, the G-G pair is most frequent, representing86.1% of the African sample, but up to 98.6% of our Asian sample.

All African 7R-alleles were associated with the A-C haplotypes, whileonly 13.9% of African 4R-alleles were associated with the A-C haplotype.DNA sequence analysis of several chimp and bonobo samples (data notshown) indicates that the G-G SNP pair is likely the ancestral sequence(FIG. 3). Thus, it appears that the original DRD4 7R allele arose onthis rarer A-C SNP background. A sample of 73 2R, 3R, 5R, and 6R-allelesshowed approximately equal association with the G-G and A-C SNPs,consistent with their proposed recombinational origin from both the 4Rand 7R-alleles (FIG. 3). Interestingly, all 26 Asian 2R-allele samplesexamined showed association with the A-C SNPs, suggesting their originfrom recombinations involving 7R-alleles (FIG. 3).

Similar results were obtained for more distant promoter and exon 1insertion/deletion polymorphisms (FIG. 1). In this case association wasinferred indirectly from data obtained for our prior population studiesand PCR analysis of a subset of the individuals used in this study. Forforty samples where parental DNA was also available and could begenotyped for these markers, phase could be directly inferred. Strongassociation was observed between the long (duplicated) L₁ promoterpolymorphism (FIG. 1) and the 7R-allele (FIG. 3), with 90.8% of7R-alleles associated with L₁ (607 alleles analyzed). In contrast, theL₁ polymorphism is coupled with only 61.9% of 4R-alleles (2102 allelesanalyzed). While population specific variation was observed (forexample, more L₁-4R coupling in Chinese than African populations),little overall L₁-4R linkage was detected (FIG. 3). The closer L₂polymorphism in exon 1 (FIG. 1) was associated with 93.4% of 7R-allelesand 86.4% of 4R-alleles, a relative difference similar to that observedfor the L₁-7R and L₁-4R association. The L₂/S₂ polymorphism is in acoding region, however, and selective constraints may be influencingallele frequency as well (Seaman, M. I., Chang, F.-M., Deinard, A. S.,Quinones, A. T., and Kidd, K. K. (2000) J. Exp. Zool. 288, 32-38).

Standard methods of estimating coalescence time for these alleles arenot applicable, given the repetitive nature of the region and the highrecombination frequency. However, calculations of allele age based onthe relatively high worldwide population frequency of the DRD4 4R and7R-alleles suggest that these alleles are ancient (>300,000 years old)(see Methods). On the other hand, calculations of allele age based onthe observed intra-allelic variability (see Methods) suggest the7R-allele is 5-10 fold “younger” (30,000-50,000 years old). Such largediscrepancies between allele ages calculated by these two methods areusually taken as evidence that selection has increased the frequency ofthe allele to higher levels than expected by random genetic drift. Theabsolute values of these estimates are greatly affected by theassumptions used in their computations, for example the assumedrecombination frequency. We have used conservative estimates ofrecombination frequency, based on the average observed for the terminal20 Mb of 11p (International Human Genome Sequencing Consortium (2001)Nature 409, 860-921). Given the observed high recombination at thislocus (Table 1 and FIG. 3), it is likely that the actual age of the7R-allele is even younger, and further LD analysis will refine theseestimates. The important conclusion, however, is that regardless of theparameters assumed, the relative age differences for the 4R and7R-alleles calculated from intra-allelic variability remains large,while their population frequency suggests they are both ancient.

The simplest hypothesis to account for 1) the observed bias innucleotide changes (K_(a)/K_(s)), 2) the unusual sequence organizationof the DRD4 7R-allele, and 3) the strong LD surrounding this allele, isthat the 7R-allele arose as a rare mutational event (or events), thatnevertheless increased to high frequency by positive selection.Advantageous alleles usually take a long time to reach a frequency of0.1, then increase rapidly to high frequencies (>0.9). While it ispossible we are observing the recent expansion of a highly advantageous7R-allele, it is more likely, we suggest, that this “two-allele” DRD4system (FIG. 3) is an example of balanced selection. Such selection maybe more pervasive in the human genome than generally thought. A balancedselection model proposes that both the 4R and 7R-alleles are maintainedat high frequencies in human populations. A variety of mechanisms couldbe proposed for such balanced selection, ranging from heterozygoteadvantage to frequency-dependent selection. According to evolutionarygame theory (Smith, J. M. Evolution and the Theory of Games. (1982)Cambridge: Cambridge University Press), the evolutionary payoff for aparticular kind of personality will depend on the existing distributionof personality types. For example, high aggression may lead to highfitness if almost everyone is meek, but might result in low fitness whenvery common, as aggressive individuals would suffer the penalties offrequent conflict. This type of frequency-dependent selection might beexpected to apply to many types of psychological variation, includingthose associated with this particular neurotransmitter receptor.

Alternative explanations to the proposed positive selection, such asrecent random bottlenecks, population expansion, and/or populationadmixture are less likely to account for the observed results.Bottlenecks have certainly occurred during human migration and evolution(Tishkoff, S. A., Dietzsch, E., Speed, W., Pakstis, A. J., Kidd, J. R.,Cheung, K., Bonne-Tamir, B., Santachiara-Benerecetti, A. S., Moral, P.,and Krings, M. (1996) Science 271, 1380-1387; Chen, C., Burton, M.,Greenberger, E., and Dmitrieva, J. (1999) Evol. Hum. Behav. 20, 309-324;Reich, D. E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P.C., Richter,D. J., Layery, T., Kouyoumjian, R., Farhadian, S. F., Ward, R., andLander, E. S. (2001) Nature 411, 199-204), and have undoubtedlyinfluenced the current worldwide DRD4 allele frequency. Numerouspopulation studies on other genes have shown that an “Out of Africa”constriction of allele diversity (and an increase in LD) likelyoccurred. In the present study, a greater diversity (and lower LD) wasfound for African DRD4 4R-alleles in comparison to the remainder of ourpopulation sample, which is consistent with the “Out of Africa”hypothesis. While one could argue that the 7R-allele frequency wasincreased by chance during the Out-of-Africa expansion, this does notexplain the unusual lack of diversity in African 7R-alleles. The mostcommon L₁L₂-7R(1-2-6-5-2-5-4)-A-C haplotype (FIG. 3) is found atfrequencies comparable to those found worldwide (>85%). It is difficultto imagine what type of bottleneck could produce such results, i.e.,strong worldwide LD for a single allele (DRD4 7R) yet little LD for theremaining alleles. A model that is consistent with the observed resultsis the “weak Garden of Eden” (wGOE) hypothesis, where the DRD4 4R-allelewould be hypothesized to be ancient and present in indigenouspopulations, while the 7R-allele was spread by the expansion out of (andinto) Africa. In such a wGOE hypothesis, positive selection for the DRD47R-allele must still be proposed.

Although we suggest that a recent mutational origin and positiveselection best account for the DRD4 7R-allele data, another possibilitycan not be ruled out. Given the highly unlikely recombination/mutationevents required to generate the 7R-allele from the 4R-allele, apossibility worth considering is the importation of this allele from aclosely related hominid lineage. What lineage that might be can only bespeculated, but Neanderthal populations were present at the approximatetime the 7R-allele originated. Under this model, the coalescence timefor the 4R and 7R-alleles would then be ancient, with the importationoccurring only recently, as measured by LD. Obviously, additionalexperimental work may clarify these speculations.

For the DRD4 locus, it is unlikely that selection for an adjacent genecan account for the proposed selection, given the distinct and unusualDNA sequence of the DRD4 7R-allele itself. If the DRD4 7R-alleleoriginated roughly 40,000 years ago, one might ask what was occurring atthat time in human history? It is tempting to speculate that the majorexpansion of humans that occurred at that time, the appearance ofradical new technology (the upper Paleolithic) and/or the development ofagriculture, could be related to the increase in DRD4 7R-allelefrequency. Perhaps individuals with personality traits such as noveltyseeking, perseverance, etc. drove the expansion (and partialreplacement)? The speculation that migration could account for thecurrent 7R-allele distribution has been proposed. In addition to suchphenotypic selection, sexual selection could be operating as well. Asoriginally defined by Darwin (Darwin, C. The Descent of Man andSelection in Relation to Sex. (1874) New York: Merrill and Baker), “anyadvantage which certain individuals have over others of the same sex andspecies solely in respect of reproduction” will lead to increasedoffspring. If individuals with a DRD4 7R-allele havepersonality/cognitive traits that give them an advantage (multiplesexual partners, higher probability for mate selection, etc.) then thefrequency of this allele will expand rapidly, depending on the culturalmilieu. Perhaps cultural differences can account for some of theobserved differences in DRD4 7R-allele frequency. Obviously, determiningthe exact nature of the DRD4 selection, and its biochemical andbehavioral basis, awaits further experimentation. Recent experiments,indicating that individuals with ADHD and possessing this unusual DRD47R-allele perform normally on critical neuropsychological tests ofattention in comparison to other ADHD probands, point to but one of manyareas of future investigation.

One may ask why an allele that appears to have undergone strong positiveselection in human populations nevertheless is now disproportionatelyrepresented in individuals diagnosed with ADHD? The CVCD hypothesisproposes that common genetic variation is related to common disease,either because the disease is a product of a new environment (so thatgenotypes associated with the disorder were not eliminated in the past)or the disorder has small effect on fitness (because it is late onset).For early onset disorders (such as autism, ADHD, etc.) we suggestentertaining the possibility that predisposing alleles are in fact underpositive selection, and only result in deleterious effects when combinedwith other environmental/genetic factors. In this context, it ispossible that prior selective constraints are no longer operating onthis gene. It is also possible to speculate, however, that the verytraits that may be selected for in individuals possessing a DRD47R-allele may predispose behaviors that are deemed inappropriate in thetypical classroom setting, and hence diagnosed as ADHD.

High Prevalence of Rare Dopamine Receptor D4 (DRD4) Alleles in ChildrenDiagnosed with Attention Deficit Hyperactivity Disorder ADHD)

Associations have been reported of the 7-repeat (7R) allele of the humandopamine receptor D4 (DRD4) gene with both the personality trait ofnovelty seeking and attention deficit/hyperactivity disorder (ADHD). Theincreased prevalence of the 7R-allele in ADHD probands is consistentwith the common variant-common disorder (CVCD) hypothesis, whichproposes that the high frequency of many complex genetic disorders isrelated to common DNA variants. Based on the unusual DNA sequenceorganization and strong linkage disequilibrium surrounding the DRD47R-allele, we proposed above that this allele originated as a raremutational event, that nevertheless increased to high prevalence inhuman populations by positive selection (see also, Ding et. al., Proc.Natl. Acad. Sci. USA 99, 309-314, 2002). We have now determined, by DNAresequencing of 250 DRD4 alleles obtained from 132 ADHD probands, thatmost ADHD 7R-alleles are of the conserved haplotype found in ourprevious 600 allele worldwide DNA sample. Interestingly, however, halfof the 24 haplotypes uncovered in ADHD probands were novel (not one ofthe 56 haplotypes found in our prior population studies). Over 10percent of the ADHD probands had- these novel haplotypes, most of whichwere 7R-allele derived. The probability that this high incidence ofnovel alleles occurred by chance in our ADHD sample is much less that0.0001. These results suggest that allelic heterogeneity at the DRD4locus may also contribute to the observed association with ADHD.

Attention Deficit Hyperactivity Disorder (ADHD) is a neurobehavioraldisorder defined by symptoms of developmentally inappropriateinattention, impulsivity, and hyperactivity with early onset. Currentestimates indicate that 3-6% of school age children are diagnosed withADHD, making it the most prevalent disorder of childhood. While thebroad DSM-IV phenotype of ADHD almost certainly has multiple biologicaletiologies, numerous family, twin and adoption studies have documented astrong genetic basis (Faraone, S V, Biederman, J. Genetics ofattention-deficit hyperactivity disorder. Child Adolesc Clin North Am1994; 3, 285-291). However, given high cross-national variation in therecognition and treatment of ADHD, we proposed that the ADHD Combinedtype (DSM-IV) without serious comorbidity should be used as a “refined”phenotype in biological and genetic research (Swanson, J M, Sergeant, JA, Taylor, E, Sonuga-Barke, E J S, Jensen, and Cantwell, D P. Attentiondeficit disorder and hyperkinetic disorder. Lancet 1998; 351, 429-433).

Despite the high heritability of ADHD, initial genome scan studies havefailed to identify genes of major effect, although a region onchromosome 16p13 has been implicated in subsequent studies by the samegroup. Such negative results are not unexpected for a complex geneticdisorder like ADHD, where phenotypic heterogeneity is likely, and thepractical but (to date) restricted sample sizes limit statistical power.Candidate gene studies, on the other hand, require much smaller samplesizes to achieve the same statistical power. The efficacy of a dopamineagonist drug, methylphenidate, in the treatment of ADHD has suggestedthat genes in the dopamine pathway may be involved in the disorder'setiology. This dopamine hypothesis of ADHD suggests a number ofcandidate genes that could logically be tested for their associationwith the disorder. The draft human genome sequence has providedinformation sufficient to examine multiple candidate genes in parallel,often representing most of the proteins in a relevant biochemicalpathway.

One of these candidate genes, DRD4, located near the telomere ofchromosome 11p, is one of the most variable human genes known. Most ofthis diversity is the result of length and single nucleotidepolymorphism (cSNP) variation in a 48 bp tandem repeat (VNTR) in exon 3,encoding the third intracellular loop of this dopamine receptor. Variantalleles containing two (2R) to eleven (11R) repeats are found, with theresulting proteins having 32 to 176 amino acids at this position.

A number of investigations have found associations between particularalleles of this highly variable gene and behavioral phenotypes. Whilesome studies have suggested that the 7R-allele of the DRD4 gene might beassociated with the personality trait of novelty seeking (Klugar, A N,Siegfried, Z, and Ebstein, R P. A meta-analysis of the associationbetween DRD4 polymorphism and novelty seeking. Mol Psychiatry 2002; 7,712-717), the most reproduced association is between the 7R-allele andattention deficit/hyperactivity disorder (ADHD). Above, we showed by DNAresequencing/haplotyping of 600 DRD4 alleles, representing a worldwidepopulation sample, that the origin of 2R- through 6R-alleles can beexplained by simple one-step recombination/mutation events. In contrast,the 7R-allele is not simply related to the other common alleles,differing by greater than 6 recombinations/mutations. Strong linkagedisequilibrium (LD) was found between the 7R-allele and surrounding DRD4polymorphisms, suggesting this allele is at least 5-10 fold “younger”than the common 4R-allele. Based on an observed bias towardsnonsynonymous amino acid changes, the unusual DNA sequence organization,and the strong LD surrounding the DRD4 7R-allele, we proposed that thisallele originated as a rare mutational event, that neverthelessincreased to high frequency in human populations by positive selection.

Why is the DRD4 7R allele, which arose recently and underwent strongpositive selection, nevertheless now disproportionately represented inindividuals diagnosed with ADHD? We suggested that selection for anadjacent polymorphism was unlikely, given the distinct and unusual DNAsequence organization of the DRD4 7R allele itself. The DRD4 7R alleleis at moderate prevalence in most populations that have been examinedfor ADHD (approximately 10-15%). Therefore, the approximate two-foldincrease in DRD4 7R allele frequency in ADHD probands (λ=1.9),calculated from a recent meta-analysis is consistent with the CommonVariant-Common Disorder (CVCD) hypothesis (also called the CommonDisease-Common Variant hypothesis) (Reich, D E, Lander, E S. On theallelic spectrum of human disease. Trends in Genetics 2001; 17,502-510). In the CVCD hypothesis, the high prevalence of a givendisorder (and its associated alleles) is attributed to either 1) theinteraction with a new environment (such that genotypes associated withthe disorder were not eliminated in the past) or 2) the disorder hassmall effect on fitness (because it is late onset). We suggest a thirdpossibility. Perhaps predisposing alleles in fact are under positiveselection, and only result in deleterious effects when combined withother environmental/genetic factors. This would explain the highprevalence of common disorders in the population, since the selectedallele would only be deleterious in a small fraction of thoseindividuals carrying it. Positive selection for particular human allelesmay, in fact, be common (Harpending, H and Rogers, A. Geneticperspectives on human origins and differentiation. Ann Rev Genomics HumGenet 2000; 1, 361-385; Tishkoff, SA, Varkonyl, R, Cahinhinan, N, Abbes,S, Argyropoulos, G, Destro-Bisol, G, et al. Haplotype diversity andlinkage disequilibrium at human G6PD: recent origin of alleles thatconfer malarial resistance. Science 2001; 293, 455-462), contributing tothe observation of unexpectedly large blocks of LD in the human genome(Reich, D E, Cargill, M, Bolk, S, Ireland, J, Sabeti, PC, Richter, DJ,et al. Linkage disequilibrium in the human genome. Nature 2001; 411,199-204; Daly, M J, Rioux, J D, Schaffner, S F, Hudson, T J, and Lander,E S. High-resolution haplotype structure in the human genome. NatureGenetics 2001; 29, 229-232; Patil, N, Berno, A J, Hinds, D A, Barrett, WA, Doshi, J M, Hacker, C R, et al. Blocks of limited haplotype diversityreveled by high-resolution scanning of human chromosome 21. Science2001; 294, 1719-1723; Sabeti, P C, Reich, D E, Higgins, J M, Levine, H ZP, Richter, D J, Schaffner, S F, et al. Detecting recent positiveselection in the human genome from haplotype structure. Nature 2002;419, 832-837).

It is a reasonable hypothesis that high prevalence human geneticdisorders will be related to some common variants in the population.However, it is unclear that single common variants will be the onlyrelevant variants. Alleles at low prevalence, most of which have not beidentified by current SNP searches targeting a small sample size (TheInternational SNP Map Working Group. A map of human genome sequencevariation containing 1.42 million single nucleotide polymorphisms.Nature 2002; 409, 928-933), could also contribute to complex disease(Pritchard, J K. Are rare variants responsible for susceptibility tocomplex diseases? Am J Hum Genet 2001; 69, 124-137). Of the hundreds of“single hit” disease genes identified to date, the vast majority containhundreds of “private” mutations that alter protein function. In order totest this Rare Variant-Common Disorder (RVCD) model for complex disease,much greater depth of DNA resequencing must be conducted, ideally inindividuals enriched for the putative mutant alleles (i.e., probands).

All previous studies of the DRD4/ADHD association have defined allelesbased only on PCR length differences. Hence, it is possible thatspecific sequence variants are actually associated with the disorder.For example, one could imagine that the selected DRD4 7R allele mighthave a higher mutation rate than the common 4R allele, and it is in factthese variant 7R alleles that predispose to ADHD. Given the largesequence diversity of this gene, in which 56 different exon 3 haplotypeswere uncovered in 600 chromosomes obtained from a worldwide sample, wedecided that direct DNA resequencing of DNA obtained from ADHD probandswas the only method that could answer this question.

Here, we confirm the increased prevalence of DRD4 7R alleles inindividuals diagnosed with the refined phenotype of ADHD (La Hoste, G J,Swanson, J M, Wigal, S B, Glabe, C, Wigal, T, King, N and Kennedy, J L.Dopamine D4 receptor gene polymorphism is associated with attentiondeficit hyperactivity disorder. Mol Psychiatry 1996; 1, 21-24). By DNAresequencing of 250 DRD4 alleles obtained from 132 ADHD probands, weshow that most ADHD associated 7R-alleles are of the conserved haplotypefound in our previous 600 allele worldwide DNA sample. Interestingly,however, over 10 percent of the ADHD probands had novel DRD4 haplotypes,not previously found in our worldwide allele sample. The probabilitythat this high prevalence of novel alleles occurred by chance in ourADHD sample is much less that 0.0001. Most of these novel haplotypeswere 7R-allele derived. These results suggest that allelic heterogeneity(the RVCD model) may also be contributing to the association of the DRD4locus with ADHD, as is routinely found for “single-gene” geneticdisorders.

Materials and Methods

Clinical. ADHD probands were recruited to participate in either clinicaltrials or the Multimodality Treatment Study of Children with ADHD (MTA;MTA Cooperative Group. A 14-month randomized clinical trial of treatmentstrategies for attention deficit/hyperactivity disorder. Arch GenPsychiatry 1999; 56, 1073-1086) at the University of California, Irvine.The refined phenotype of ADHD was diagnosed by a research assessmentbattery described in detail elsewhere (Hinshaw, S P, March, J S,Abikoff, H, Arnold, L E, Cantwell, D P, Conners, C K, et al.Comprehensive assessment of childhood attention-deficit hyperactivitydisorder in the context of a multisite, multimodel clinical trial. JAttention Disorders 1997; 1, 217-234), that includes psychiatricinterviews and questionnaires about the symptoms of the disorder andother psychopathological behavior related to comorbid disorders.Instruments used included the Diagnostic Interview Schedule forChildren, Fourth Version (DISC-IV), the SNAP-IV Rating Scale and alocally developed family and developmental history questionnaire. Inaddition, measures of ability and achievement were obtained using theWechsler Intelligence Scale for Children, Third Revision (WISC-III) andthe Wechsler Individual Achievement Test (WIAT). The inclusion criteriaincluded a DSM-IV diagnosis of ADHD-Combined Type, which requires theendorsement of at least six of the nine symptoms of inattention and sixof the nine symptoms of hyperactivity/impulsivity. High cutoffs onparent and teacher ratings of ADHD items on the SNAP rating wererequired. Subjects with an IQ score on the WISC-III<80 were excluded.Information was also obtained for oppositional defiant disorder (ODD),but a comorbid diagnosis of ODD did not exclude the subject. A diagnosisof other comorbid disorders (such as Tourette Syndrome), or treatment ofsymptoms of other disorders with non-stimulant psychotropic drugs, wereexclusion criteria for this study.

Establishing Cell Lines and DNA Purification. Lymphoblastoid cell lineswere established for all ADHD probands. Methods for transformation, cellculture, and DNA purification have been described above (see also,Chang, F-M, Kidd, J R, Livak, K J, Pakstis, A J, and Kidd, K K. Theworld-wide distribution of allele frequencies at the human dopamine D4receptor locus. Hum Genetics 1996; 98, 91-101).

PCR amplification and DNA sequencing. The DRD4 exon 3 VNTR was amplifiedwith primer sets described previously (5′-CGTACTGTGCGGCCTCAACGA-3′(SEQ.ID NO. 4) and 5′-GACACAGCGCCTGCGTGATGT-3′ (SEQ. ID NO. 5); 705nucleotide product for the 4R-allele). PCR reactions were conducted in25 microliter volumes, containing 10 ng genomic DNA, 200 micromolardXTPs, 0.5 micromole of each primer, 1×PCR buffer (Qiagen), 1XQ-solution (Qiagen) and 0.625 units Taq DNA polymerase (Qiagen).Amplification was performed using Perkin-Elmer 9700 thermal cyclers. A20 second, 96-degrees C. hot start was used, followed by 40 cycles of 95degrees C. for 20 seconds and 68 degrees C. for 1 minute. Following a4-minute chase at 72-degrees C., excess primers were eliminated with 0.5units of Shrimp Alkaline Phosphatase (SAP, Amersham Life Science), 0.1unit of Exonuclease I (Exo I, Amersham Life Science) and 1×SAP buffer(Amersham Life Science). The SAP/Exo I reaction was carried out at 37degrees C. for 1 hour, followed by a 15-minute heat inactivation at72-degrees C. The DNA from the SAP/Exo I reaction was used directly forDNA sequencing. For individuals heterozygous for DRD4 alleles, the twoallelic PCR products were first separated on 1.2-% agarose gels. DNAcycle sequencing was conducted by standard techniques, using ABI 3100and 3700 automated sequencers. Overall PCR/resequencing success wasgreater than 95%. One allele from an ADHD proband,9R(1-8-25-5-2-5-2-23-4), was included in our prior worldwide sample. DNAsequences of the novel DRD4 haplotypes reported herein have beensubmitted to GenBank (Accession numbers AY151027-AY151038).

Analysis of sequence data: Analysis of sequence data was accomplishedusing PHRED, PHRAP, POLYPHRED and CONSED (Ewing, B, Hiller, L, Wendl, MC, and Green, P. Base-calling of automated sequencer traces using phred.I. Accuracy assessment. Genome Res 1998; 8, 175-185; Ewing, B and Green,P. Base-calling of automated sequencer traces using phred II. Errorprobabilities. Genome Res 1998; 8, 186-194; Nickerson, D A, Tobe, V O,and Taylor, S L. Polyphred: automating the detection and genotyping ofsingle nucleotide substitutions using fluorescence-based resequencing.Nucleic Acids Res 1997; 14, 2745-2751; Nickerson, D A, Taylor, S L,Weiss, K M, Clark, A G, Hutchinson, R G, Stengard, J, et al. DNAsequence diversity in a 9.7-kb region of the human lipoprotein lipasegene. Nature Genetics 1998; 19, 233-240). These programs are used toclean and assemble the sequence files, and aid in the detection of DNApolymorphism. For every position in the DRD4 consensus sequence,POLYPHRED examines each sample sequence for evidence ofpolymorphism/heterozygosity. The rank limit for identifying a positionas polymorphic is under user control. Based on our experience, we haveconfigured POLYPHRED to identify all potential polymorphisms of rank1-4, which are then independently evaluated by two skilledinvestigators.

Capture of individual genotypes/haplotypes into a database (SNPMAN). Thecollection of SNPs into a relational database is done via an in-housesoftware package we have designated SNPMAN. SNPMAN is a package of 3main programs written originally in PERL and SQL and now available inboth binaries and open source format. The first program (SNPMAN) isdesigned to collect the SNP information from POLYPHRED output files andtransform it into acceptable SQL command files, later to be executed bya database operator (DBO). The second program (MANIP) is theCONSED-addon extension that allows an experienced chromatogram reader toadjust or delete database information in case of false positive or falsenegative polymorphisms. The last program (GIMMEPRETTYBASE) in the SNPMANpackage converts existing polymorphism tables into acceptable inputfiles for visual genotyping via VG.

Statistical Analysis. Allele distributions were compared using Fischer'sExact test for a 2×k table, as implemented in SAS (v.6.12, running on aSUN Ultra2 Enterprise workstation). In our prior worldwide sample(above), all DRD4 repeat lengths except for 4R were oversampled by afactor of two. This was corrected for before comparisons were conductedwith the present sample.

Results

DNA was isolated from 132 probands diagnosed with the refined phenotypeof ADHD, sequentially identified as part of ongoing research andclinical trials programs at the UCI Child Development Center (seeMethods). Table 2 gives the demographics, ADHD symptoms and psychometrictest scores of these probands. As expected, the majority (80%) ofindividuals were of European ancestry and male. On the SNAP, averagerating per item summary scores of inattention andhyperactivity/impulsivity above 2.0 are considered severe. The averageSNAP for this group of probands was 2.22 (Table 2). Ratings were alsoobtained for oppositional defiant disorder (ODD), often found to becomorbid with ADHD. The observed average for ODD (1.62) wassignificantly higher than for population norms. Other psychometricmeasures of IQ (WISC) and achievement (WIAT) were in the normal rangefor the group (Table 2). TABLE 2 Demographics, ADHD symptoms, andpsychometric test scores of the ADHD probands. Females Males Total (26)(106) (132) SD Demographics Age 8.62 8.83 8.78 1.88 % European 79.7 %Hispanic 11.4 % African American 3.6 % Asian 2.8 % Native American 2 %Pacific Island 0.4 ADHD symptoms Inattention average 2.53 2.3 2.41 0.52Hyperactivity/impulsivity 2.07 1.96 2.01 0.74 ADHD average 2.29 2.152.22 0.51 ODD average 1.68 1.57 1.62 0.81 Psychometric tasks WISC blockdesign 101.9 113.3 107.7 33.9 WISC vocabulary 96.9 105.4 101.2 30.4 IQaverage 99.4 109.4 104.4 25.7 WIAT reading 96.5 100.5 98.5 14.6 WIATmath 100.6 101.4 101.9 13.8 WIAT spelling 96.6 97.8 97.2 13.8 Achievmentaverage 97.8 99.9 98.9 12.5

The exon 3 VNTR region of the DRD4 gene was amplified from these DNAs,and the distribution of DRD4 genotypes obtained in this sample is shownin Table 3. As reported in numerous other studies, including our own,the frequency of ADHD individuals with at least one DRD4 7R allele isapproximately two-fold greater (43.2%) than found in ethnically matchedcontrol individuals. Interestingly, the observed frequency of 2R and 3Ralleles was also increased in this ADHD sample (Table 3). In Europeanpopulations the observed allele frequency for DRD4 is 2R=0.07, 3R=0.03,4R=0.73, 5R=0.01, 6R=0.02, 7R=0.12, 8R<0.01, 9R<0.001 (2N=1652; ref. 20and 22 and unpublished data). Calculating an expected genotypedistribution (assuming Hardy-Weinberg equilibrium, Table 3) indicatesthat only 22% of Europeans should have a 7R/x geneotype, consistent withprior experimental control data from our research. Adjusting thesevalues for the increased frequency of 7R alleles in some non-Europeanpopulations can not account for the increased frequency in ourpredominantly European ancestry ADHD sample (Table 2). TABLE 3 Genotypesof 132 ADHD Probands. Genotype 2R/4R 3R/3R 3R/4R 4R/4R 4R/6R 2R/7R 3R/7R4R/7R 6R/7R 7R/7R 4R/8R 4R/9R Observed 20(19) 1 9(8) 43(41) 2(1) 5 3 421 4 1 1 Expected 14 <1 6 70 4 2 1 23 <1 2 1 <1

DNA sequence analysis of 250 DRD4 alleles obtained from these ADHDprobands found 24 different haplotypes (Table 4). No data were obtainedon 14 alleles (5.3%; two 2R, seven 4R and five 7R alleles) due to PCRand/or sequencing failures. Altogether, we screened over 200,000 bp ofgenomic DNA and 1,132 48-bp repeats. Interestingly, only half (12/24) ofthe observed haplotypes (Table 4) were identified previously in ouranalysis of 600 DRD4 alleles obtained from a worldwide population sample(see GenBank accession nos. AF395210-AF395264). For example, using ourproposed nomenclature for DRD4 haplotypes (FIG. 2), the majority of 7Ralleles found in our ADHD probands (45/55 =81.8%) are the common7R(1-2-6-5-2-5-4) haplotype (Table 4). In this nomenclature, the numbersin brackets refer to different 48 bp repetitive sequence motifs (FIG.2). Likewise, the majority of 2R and 4R alleles were the common 2R(1-4)and 4R(1-2-3-4) haplotypes, respectively. These three common alleles(2R,4R, and 7R) account for 87.2% of the observed alleles (Table 4),similar to the proportion obtained in our 600 allele population sample.The remaining 9 alleles are rare 3R,4R,6R, and 7R variants observedpreviously (Table 4). TABLE 4 Haplotypes of 250 DRD4 exon3 alleles from132 ADHD probands. Allele N Haplotype 2R 23 1-4 3R 14 8 1-7-4 3 1-9-4 21-2-20 1 1-6-4 4R 156 150 1-2-3-4 2 1-2-14-4 2 1-2-5-4 1 1-2-6-4 11-26-3-4 6R 3 1 1-2-3-2-3-4 1 1-2-6-5-2-20 1 1-2-6-2-5-4 7R 55 451-2-6-5-2-5-4 2 1-2-6-5-2-5-19 2 1-2-6-1-2-3-4 1 1-2-6-5-2-23-4 11-2-6-5-8-5-4 1 1-2-14-5-2-5-4 1 1-2-3-17-2-5-4 1 1-8-25-5-2-5-4 11-2-6-5-2-3-4 8R 1 1-2-6-26-5-2-3-4 9R 1 1-8-25-5-2-5-2-23-4N: allele number identified by sequence analysis; haplotype nomenclatureis described in FIG. 1. Alleles in normal font, were identifiedpreviously in a survey of 600 world-wide alleles.²² Alleles in bold areunique to this study.

The other half of the observed haplotypes were unique, not identified inour extensive prior analysis (Table 4). Excluding the common variants,expected to be present in all samples, sixty percent (12/20) of rare(<0.01 frequency) variants found in this ADHD sample were unique.Fifteen ADHD probands had one of these 12 unique DRD4 haplotypes(15/132=11.4%). For seven of these probands, parental DNA was available.PCR resequencing indicated that the variant allele was present in one ofthe parents, and not a new mutation. All but one of these 12 novelalleles produce an altered amino acid sequence in the resulting DRD4protein compared to the common allele (FIG. 4). For example, theobserved 4R(1-2-6-4) variant would substitute a Gly for a Ser and a Profor a Gln in comparison to the common 4R (1-2-3-4) variant (Table 4).This result is similar to our prior population studies on the DRD4 gene,where 87% of the observed rare variants altered the amino acid sequenceof the resulting protein.

The origin of most of these newly observed variants can be inferred tobe 7R allele derivatives, based on their nucleotide sequence (FIGS. 4and 5). The 5 and 6 variant motifs (FIG. 2) are diagnostic for the 7Rallele, found only in this allele and its derivatives. Ten of the 12haplotype variants contain these motifs (Table 4), and hence likelyarose as recombination/mutation events involving a 7R allele (FIG. 4).For example, the 4R(1-2-5-4) allele likely arose as a recombinationevent between a 4R(1-2-3-4) allele and a 7R(1-2-6-5-2-5-4) allele.Genotyping six of these variant alleles for flanking SNPs diagnostic forthe 4R and 7R alleles confirmed their hypothesized origin (data notshown; FIG. 4). The finding that the majority of these rare variants arederived from 7R alleles should be contrasted with our prior populationstudies of the DRD4 gene, in which rare variants were found to beequally derived from 4R and 7R alleles. There is an approximate two-foldincrease in rare 7R alleles in this ADHD sample in comparison to ourprior population sample (18.2% versus 11.0%; Table 4).

Including these 7R-related sequence variants in the 7R allele categoryremoves 5 individuals from the non-7R category, originally classifiedbased on their PCR fragment length (numbers in brackets in Table 3).Altogether, individuals with 7R and derivative 7R alleles account for47% of the ADHD proband population (Table 3).

Twenty percent of the ADHD DRD4 alleles sequenced in this study are ofnon-European origin (Table 2). However, 33% (5/15) of the individualswith novel rare alleles were non-European in genetic origin. While thisdifference is not statistically significant, it is possible thatpopulation stratification could account for a portion of the observeddifference. Our prior worldwide sequence sample included 220 EuropeanDRD4 alleles, as well as 164 Asian, 122 African, 76 North and SouthAmerican, and 18 Pacific Island ancestry alleles. Non-4R alleles wereoversampled approximately two-fold in this prior study. Given the ethnicbreakdown of our ADHD probands (Table 2), then, our prior worldwideresequencing sample can serve as an extensively “oversampled” control,in which we have comparable numbers of European origin alleles, and10-20 fold larger numbers of non-European alleles.

Sixty-seven different haplotype variants of DRD4 were seen in either ourprior population sample, our ADHD sample (Table 4), or both. Sixty ofthese haplotypes are at low (<0.01) frequency. We can therefore ask asimple question: How likely is it, assuming a pool of uncommon DRD4alleles, that these two samples (population control and ADHD) would givethe observed results? Most of the rare alleles were found only once,hence we can only estimate their frequency in the population. Ourinitial sample size of 600 chromosomes, however, is expected to detecteighty percent of variants at a frequency of 0.002 or greater. Based onDRD4 allele frequency distributions (Table 4 and above), where the sixcommon 2R-7R alleles account for >90% of the observed alleles, we canestimate that there can be, at most, 85 different DRD4 alleles atfrequencies greater than 0.001. At a minimum, therefore, we haveidentified 79% (67/85) of DRD4 alleles with a frequency greater than0.001. Alleles less frequent than 0.001 would be found rarely inpopulation samples of the current size, and hence can not contributesignificantly to the observed distributions.

One can consider the possibility, then, that among a pool of uncommonalleles, there were 12 undetected alleles (on 15 chromosomes) thathappened by chance to occur among the 250 chromosomes obtained from ADHDindividuals. Likewise, one can consider the possibility that these 12alleles were not found among 600 random chromosomes. We considered arange of allele frequencies for these 12 alleles, from 1/400 each to1/1000 each. For each set of allele frequencies, the probability ofseeing none of these 12 new alleles among the 600 chromosomes examinedpreviously can be easily calculated as a multinomial probability(Probability “A”). Likewise, the probability of seeing 9 of these newalleles once, and 3 twice, among 250 chromosomes can be calculated(Probability “B”). For all sets of allele frequencies, eitherprobability “A” or probability “B” is much less than 0.0001. It isextremely unlikely that the distribution of alleles in these two sampleshas occurred by chance.

We also considered the possibility that this difference is related notto the diagnosis of ADHD, but rather to population stratification.Indeed, one of the reasons we sequenced such a large worldwide samplewas to address this issue. We constructed a series of comparison groupsfrom our worldwide population sample. Each comparison group containedthe 220 alleles from samples of European origin. Added to this was arandom selection from the remaining non-European samples to approximatethe ethnic distribution of the ADHD sample (Table 1). In all cases theallele distribution differed significantly between the ADHD sample andthe comparison group (p<<0.0001). It is extremely unlikely, therefore,that population stratification and undetected ethnic bias can accountfor the distribution differences in our population and ADHD samples. Weconclude, then, that the most likely reason for the observed differenceswas our ascertainment of this sample by diagnosis of ADHD, and thatvariants present at low frequency in the general population were“enriched” in the ADHD sample.

Discussion

The increased frequency of the DRD4 7R allele in ADHD probands isconsistent with the predictions of the CVCD hypothesis (FIG. 5). By DNAresequencing from probands diagnosed with the refined phenotype of ADHD,we determined that the majority (83%) of 7R alleles in these individualswere of the common 7R(1-2-6-5-2-5-4) haplotype found previously (Table3). However, we uncovered an unusually high prevalence (50%) of novelhaplotypes in the 24 haplotypes observed in our sample, most 7R allelederivatives (Table 4). Greater than 10% of ADHD probands had one ofthese rare alleles. Including these rare derivatives (determined bysequence analysis) in the “7R” class increased the number of ADHDindividuals with 7R alleles from 43.2% to 47% (Tables 3 and 4). It isimpossible to know without further biochemical/physiological/behavioralexperimentation if these derivatives are functionally equivalent/relatedto 7R alleles (see below). It is likely, however, that all previousstudies of the DRD4 /ADHD association modestly underestimated therelative risk by only examining repeat length rather than DNA sequence.

What can account for the high frequency of novel alleles uncovered inthe present study? If recombination/mutation were random, one wouldexpect that the majority of derivative alleles would have 4R origins,since this is the most common allele, even in ADHD probands. The DRD4 4Rallele is also older than the 7R allele, and hence there has beengreater time to accumulate mutations in this allele (unless they havebeen selected against). In our prior population study, approximatelyequal numbers of 4R and 7R derivative alleles were uncovered, suggestinga mutation/recombination bias toward 7R alleles (or a stronger selectionagainst 4R variants). In comparison to our prior population survey,however, over 90% of the rare derivative alleles in this ADHD samplehave 7R origins (FIG. 4).

We estimate that there are less than 85 DRD4 alleles with populationfrequency greater than 0.001, and we have identified a minimum of 79% ofthese alleles. While there could be hundreds of extremely rare DRD4alleles (at a population frequency of 0.0001), such alleles could onlycontribute a few examples to our original population sample. Therefore,given the sample sizes used in this and our prior population study, itis expected that, at most, 2-3 alleles might be found only in one sampleand not the other. It is extremely unlikely (p<<0.0001), therefore, thatfinding 12 new alleles (on 15 chromosomes) in the ADHD population wasdue to chance or population stratification. We propose, then, that ourascertainment of the sample by diagnosis of ADHD was the reason for thisobserved increase in derivative DRD4 7R alleles.

Further studies, including more extensive population sampling, canrefine the number and frequency distribution of rare DRD4 alleles. Inparticular, it would be informative to know if rare DRD4 alleles exhibitbiased geographic/ethnic ancestry distributions. Such information wouldbe essential for the design and interpretation of replicate studies ofthe current work. In addition, family based analyses can help determineif rare alleles are preferentially transmitted to ADHD probands.However, for behavioral disorders such as ADHD, such studies should beinterpreted with caution. It is common in such disorders to be unable toconsent key members of a trio (mostly fathers). An inability toascertain a truly “random” sample of parental genotypes (for example, ifthere is preferential absence of a parent transmitting a putativepredisposing gene) could contribute to biases in tests such as the TdT(West, A, Langley, K, Hamshere, M L, Kent, L, Craddock, N, et al.Evidence to suggest biased phenotypes in children with attention deficithyperactivity disorder from completely ascertained trios. Mol Psychiatry2002; 7, 962-966).

The high frequency of amino acid changing variants in these rarehaplotypes (>90%), and the low probability that we uncovered thesevariants by chance (p<<0.0001) suggest that allelic heterogeneity isalso playing a role in the association of the DRD4 gene and ADHD (RVCDModel, FIG. 5). The finding of allelic heterogeneity for the DRD4 /ADHDassociation should not be surprising, since “private” mutations arefound frequently for the majority of “single-hit” genetic diseases, evenones where a particular variant is common. For example, while the commonAF508 mutation is found in 70% of cystic fibrosis probands, hundreds ofrarer mutations have also been identified (Serre, J L, Simon-Bouy, B,Mornet, E, Jaume-Roig, B, Balassopoulou, A, Schwarz, M, et al. Studiesof RFLP closely linked to the cystic fibrosis locus throughout Europelead to new considerations in population genetics. Hum Genet 1990; 84,449-454). There is no strong experimental or theoretical reason whygenes associated with complex genetic disorders involving multiple genesshould utilize a different mutational spectrum than genes for single-hitdisorders. We suggest, then, that both CVCD association and allelicheterogeneity (RVCD) contribute to the association of the DRD4 gene andADHD (FIG. 5). The observation of increased allelic heterogeneity addsfurther support to the hypothesis that the DRD4 gene itself, rather thanan adjacent variant in strong LD with DRD4, is responsible for theassociation.

While data exist indicating that DRD4 protein variants containingdifferent VNTR lengths exhibit different biochemical properties(Asghari, V, Sanyal, S, Buchwaldt, S, Paterson, A, Jovanovic, V, andVanTol, H H M. Modulation of intercellular cyclic AMP levels bydifferent human dopamine D4 receptor variants. J Neurochem 1995; 65,1157-1165; Jovanovic, V, Guan, H—C, and VanTol, H H M. Comparativepharmacological and functional analysis of the human dopamine D4.2 andD4.10 receptor variants. Pharmacogenetics 1999; 9, 561-568), little isknown of the effect of sequence (amino acid) differences in this regionof the protein. The functional importance of changes at this position inthe DRD4 protein, however, in a region that couples to G proteins andmediates postsynaptic effects (Civelli, O, Bunzow, J R, Grandy, D K.Molecular diversity of the dopamine receptors. Annu Rev PharmacolToxicol 1993; 32, 281-307), seems likely. For example, many of theobserved changes are quite dramatic [i.e., substituting a Pro for a Glnin 4R(1-2-6-4); FIG. 4], and might be expected to alter the DRD4 proteinstructure/function. Clearly, further biochemical studies would behelpful. Such studies should be interpreted with caution, however.Observed biochemical differences do not necessarily imply differences atthe behavioral level. Many genetic/biochemical systems exhibit greatbuffering capacity, and biochemical variation often has littlephysiological effect (Hartman, J L, Garvik, B, and Hartwell, L.Principles for the buffering of genetic variation. Science 2001; 291,1001-1004). Likewise, not finding biochemical differences between DRD4variant proteins does not imply that functional differences do not existat a behavioral level. It is often unclear which biochemical parameteris relevant to test, especially for proteins like DRD4, where most ofthe interacting proteins are as yet unknown. Further, subtle biochemicalchanges, difficult to detect in vitro and in vivo, can have largeeffects at the organismal level. The decade long search for the relevantbiochemical basis of Huntington Disease, following the identification ofthe mutation, is but one recent example. For these reasons, we suggestthat genetic approaches will remain more powerful than biochemicalapproaches at detecting associations with behavioral disorders. Wetherefore suggest that in addition to further biochemical analysis ofDRD4 variants, direct genotype/phenotype correlations continue to bepursed, including brain imaging and model organism experiments (Dulawa,S C, Grandy, D K, Low, M J, Paulas, M P, and Geyer, M A. Dopamine D4receptor-knock-out mice exhibit reduced exploration of novel stimuli. JNeurosci 1999; 19, 9550-9556). It is the physiological/behavioraloutcome of genetic variation that is most relevant. The finding thatindividuals with ADHD who possess a DRD4 7R allele perform normally oncritical neuropsychological tests of attention in comparison to otherADHD probands points to but one of many areas of future investigation.

Based on the current work and the hypothesized origin of human DRD4diversity, we suggest that future studies might group individuals basedon DRD4 genotype differently then in the past. Only VNTR length wasconsidered, usually split into 7R(+) and 7R(−) categories. The DRD4locus appears to behave like a “two-allele” system (4R and 7R) underbalanced selection. The common 4R allele appears to be the ancestralallele, with the 7R allele being a much younger allele. All rarevariants appear to be recombination/mutation products of these common 4Rand 7R alleles (FIG. 4 and above). For example, the 2R allele likely hasboth a 4R and 7R origin. Hence, simple 7R(+) and 7R(−) categories maynot be appropriate divisions, and one should entertain other potentialgroupings. In particular, one might hypothesize that any amino acidalteration from the conserved ancestral 4R(1-2-3-4) haplotype might leadto altered biochemistry/phenotype. Tests of this hypothesis would groupindividuals as 4R/4R versus non-4R/4R for purposes of hypothesistesting.

What does the DRD4 /ADHD association mean? We have speculated that thevery traits that may be selected for in individuals with a DRD4 7Rallele may predispose behaviors that are deemed inappropriate in thetypical classroom setting and hence diagnosed as ADHD. Thisenvironmental mismatch hypothesis (Jensen, P S, Mrazek, D, Knapp, P K,Steinberg, L, Pfeffer, C, Schowalter, J, and Shapiro, T. Evolution andrevolution in child psychiatry: ADHD as a disorder of adaptation. J AmAcad Child Adolesc Psychiatry 1997; 36, 1672-1679) has testablepredictions, including the potential benefit of altered educationalapproaches. In this hypothesis, the DRD4 7R subset of individualsdiagnosed with ADHD are assumed to have a different, evolutionarilysuccessful behavioral strategy rather than a disorder. Alternatively, wealso speculated that DRD4 7R, while selected for in human populations,could have deleterious effects only when combined with other geneticvariants. This complex genetic model for ADHD also has testablepredictions. One of the many important questions stemming from thishypothesis is the number and nature of these interacting genes. Is DRD47R one of only a few (or a few hundred) predisposing alleles?

The DRD4 7R/ADHD association is one of the most reproduced in complexbehavioral disorders. However, the approximately two-fold riskassociated with the DRD4 7R allele and ADHD has been described as“small”. The implication is that DRD4 7R is but one of many predisposingalleles (a classic QTL; Lynch, M and Walsh, B. Genetic analysis ofQuantitative traits (Sinauer Associates, Inc. Sunderland, M A) 1998),and indeed may be only a “modifier” of yet undiscovered predisposinggenes. Certainly, this is a possibility. However, while a two-fold riskmay be considered small in some contexts, this risk needs to be put inthe perspective of observed DRD4 allele frequencies and the predictionsof the CVCD hypothesis (FIG. 5).

In the populations of predominantly European ancestry used in mostinvestigations of the DRD4/ADHD association, the allele frequency ofDRD4 7R is approximately 12-15%. Therefore, even if the presence of aDRD4 7R allele was a necessary predisposing condition for ADHD (i.e.,100% of ADHD probands had at least one copy of this allele), andassuming Hardy-Weinberg equilibrium, the increase in observed frequency(and relative risk) would be only 3.6 fold (FIG. 5). If only half ofADHD is “caused” by DRD4 7R, then the increase in observed frequencywould be 1.8 fold. Common alleles associated with a particular disorder,then, can only exhibit modest increases in allele frequency in affectedindividuals, and hence have modest relative risks (i.e., small λ). Mostcurrent genome scans of complex genetic disorders, including one forADHD, would not have detected genomic regions with λ<2-3.

Are λ values less than 2-3 of little significance? Do they imply thatthe associated allele has little impact on the disorder? On thecontrary, they are exactly of the magnitude one expects if the CVCDhypothesis is correct. Likewise, the RVCD model also predicts modestrelative risks, if one sums the contributions of all variants in asingle gene (FIG. 5). It is informative to propose a simple model forADHD based on the CVCD hypothesis and the DRD4 7R association (FIG. 7).Unlike rare disorders like Huntington Disease, where the disease alleleis rare and the allelic relative risk is large (>5,000 fold, FIG. 7),what if alleles predisposing to ADHD are common in the population? FIG.7 outlines one such model, in which three different dominant alleles(designated DRD4 7R, b, c in three different genes) interact topredispose to the disorder. In this model, each of these alleles is atpolymorphic frequency (0.05-0.12), and it is assumed that any two ofthem in combination predispose to ADHD. In such hypothetical interactinggene systems, any of the three “disease” alleles (DRD4 7R, b, or c)could also be described as “modifier” alleles, since their presence orabsence effect the “penetrance” of the other alleles. Such interactinggenetic systems should be common, since most gene products are part ofmultiprotein assemblies or biochemical pathways. Obviously, many othermodels could be proposed, involving recessive alleles, additional genes,etc. For example each predisposing “allele” could be many rare alleles(the RVCD model, FIG. 5), that in total have a frequency of 0.05.However, the model proposed in FIG. 7 is one of the simplest in whichinteracting alleles are neither necessary nor sufficient. In thisexample, approximately 5% of individuals would have one of thehypothesized predisposing genotypes [(DRD4 7R/x)(b/x), (DRD4 7R/x)(c/x),(b/x)(c/x)], approximately the observed incidence of ADHD (FIG. 7). Noneof the predisposing alleles would be either necessary or sufficient to“cause” ADHD. None of the hypothetical predisposing alleles would have ahigh λ (2-4 fold relative risk, FIG. 7), and none would likely bedetected with genome scans of typical size. Yet according to this model,these are the predisposing alleles that are the object of our search.Similar conclusions could be reached for a variety of other likelymodels.

What can be concluded from such models? The observed two-fold increasein DRD4 7R allele frequency in ADHD probands is approximately 54% of themaximum possible (if all ADHD is genetic and related to DRD4 7R). Asdiscussed above, this estimate modestly underestimates the relativerisk, since rare 7R derivatives, as uncovered in this study, would nothave been identified in prior work. The observed risk is approximately87% of the maximum possible if 50% of ADHD has a nongenetic cause. Ifone assumes that ADHD predisposition is related to many differentgenes/alleles, such values for a single allele are, in fact, unusuallyhigh. We conclude, therefore, that the observed DRD4 7R-allele/ADHDassociation is not “small”, but is of a magnitude quite surprisinglyhigh. It suggests that this allele is associated with a minimum of25%-50% of the observed cases of ADHD. It further suggests that as fewas one or two other common alleles in other genes, in combination withDRD4 7R (FIG. 7), could account for most of the disorder.

The Genetic Architecture of Selection at the Human Dopamine Receptor D4(DRD4) Gene Locus

Associations have been reported of the 7-repeat (7R) allele of the humandopamine receptor D4 (DRD4) gene with both the personality trait ofnovelty seeking and attention deficit/hyperactivity disorder. Above,based on the unusual DNA sequence organization of the DRD4 7R VNTR, weproposed that the 7R allele originated as a rare mutational event thatincreased to high frequency by positive selection (see also, Ding et.al., Proc. Natl. Acad. Sci. USA 99, 309-314, 2002). We have nowresequenced the entire DRD4 locus from 103 individuals homozygous for2R, 4R or 7R variants of the VNTR, a method developed to directlyestimate haplotype diversity. DNA from individuals of African, European,Asian, North and South American and Pacific Island ancestry were used.4R/4R homozygotes exhibit little linkage disequilibrium (LD) over theregion examined, with more polymorphisms observed in African DNAsamples. In contrast, the evidence for strong LD surrounding the 7Rallele is dramatic, with all 7R/7R individuals (including those fromAfrica) exhibiting the same polymorphisms at most sites. Byintra-allelic comparison at 18 high frequency polymorphic sites spanningthe locus, we estimate that the 7R allele arose at the time of the “outof Africa” human exodus (approximately 42,500 years ago). Further, thepattern of recombination at these polymorphic sites is that expected forselection acting at the 7R VNTR itself, rather than at an adjacent site.We propose a model for selection at the DRD4 locus consistent with theseobserved LD patterns and the known biochemical and physiologicaldifferences between receptor variants.

The human dopamine receptor D4 (DRD4) gene, located near the telomere ofchromosome 11p, exhibits an unusual amount of expressed polymorphism(Lichter, J. B., Barr, C. L., Kennedy, J. L., Van Tol, H. H. M., Kidd,K. K. and Livak, K. J. (1993) Human Molecular Genetics 2, 767-773; Ding,Y. C., Chi, H. C., Grady, D. L., Morishima, A., Kidd, J. R., Kidd, K.K., Flodman, P., Spence, M. A., Schuck, S., Swanson, J. M., et al.(2002) Proc. Natl. Acad. Sci. USA 99, 309-314; Grady, D. L., Chi, H. C.,Ding, Y. C., Smith, M., Wang, E., Schuck, S., Flodman, P., Spence, M.A., Swanson, J. M., and Moyzis, R. K. Mol Psychiatry 8, 536-545). Muchof this variation is the result of length and single nucleotidepolymorphism (SNP) changes in a 48 bp tandem repeat (VNTR) in exon 3,encoding the third intracellular loop of this D2-like receptor. Allelescontaining two (2R) to eleven (11R) repeats are found, with over 67different haplotype variants uncovered to date. The three most common2R, 4R, and 7R variants, however, represent over ninety percent of theobserved allelic diversity. In most geographical locations, the 4Rallele is the most common, while 2R and 7R allele frequency varieswidely (Chang, F.-M., Kidd, J. R., Livak, K. J., Pakstis, A. J., andKidd, K. K. (1996) Hum. Genet. 98, 91-101).

The functional significance of these length/sequence changes in the DRD4protein, in a region that couples to G proteins and mediatesintercellular cAMP levels, has been documented (Jovanovic, V., Guan, H.C., and Van Tol, H. H. M. (1999) Pharmacogenetics 9, 561-568; Oak, J.N., Oldenhof, J., and Van Tol, H. H. M. (2000) European J Pharmacology404, 303-327). In particular, the 7R variant exhibits a blunted abilityto reduce cAMP levels in comparison to the common 4R variant. The DRD4protein is expressed in a number of brain regions, with high levelexpression in the prefrontal cortex, thought to be involved incognition, attention and other higher brain functions. Significantly,DRD4 knockout mice display better performance on complex motor tasks,are supersensitive to cocaine, ethanol and methamphetamine, and exhibitreduced exploration of novel stimuli (Rubinstein, M., Phillips, T. J.,Bunzow, J. R., Falzone, T. L., Dziewczapolski, G., Zhang, G., et al.(1997) Cell 90, 991-1001; Dulawa, S. C., Grandy, D. K., Low, M. J.,Paulas, M. P., and Geyer, M. A. (1999) J Neurosci 19, 9550-9556). Takentogether, these results are consistent with the proposal that DRD4receptors act as inhibitors of neuronal firing, especially in theprefrontal cortex.

Based on these biochemical and physiological observations, a number ofinvestigations have looked for associations between particular allelesof this highly variable gene and behavioral phenotypes (Swanson, J.,Deutsch, C., Cantwell, D., Posner, M., Kennedy, J., Barr, C., Moyzis,R., Schuck, S., Flodman, P., and Spence, M. A. (2001) ClinicalNeuroscience Research 1, 207-216; Faraone, S. V., Doyle, A. E., Mick,E., and Biederman, J. (2001) Am J Psychiatry 158, 1052-1057; Klugar, A.N., Siegfried, Z., and Ebstein, R. P. (2002) Mol Psychiatry 7, 712-717).While some studies have suggested that the DRD4 7R allele might beassociated with the personality trait of novelty seeking, the mostreproduced association is between the 7R allele and attentiondeficit/hyperactivity disorder (ADHD). ADHD is the most prevalentdisorder of childhood (approximately 5% incidence), defined by symptomsof developmentally inappropriate inattention, impulsivity, andhyperactivity. The approximately two fold greater prevalence of the DRD47R allele in ADHD probands (1=1.9), calculated from a recentmetaanalysis, indicates that this allele is associated with asignificant fraction (25%-50%) of the attributable genetic risk for thedisorder.

We have shown above by DNA resequencing/haplotyping of 600 DRD4 VNTRs,representing a worldwide population sample, that the origin of mosthaplotype variants could be explained by simple one-steprecombination/mutation events. In contrast, the 7R allele is not simplyrelated to the other common alleles, differing by greater than 6recombinations/mutations. This unusual sequence architecture of the 7RVNTR, suggesting it arose as a rare mutational event, led to exploratorymeasures of linkage disequilibrium (LD) between the 4R and 7R alleles.Large discrepancies between allele ages estimated from low intra-allelicvariability and high population frequency are taken as evidence thatselection has increased the frequency of an allele beyond that expectedby chance (Slatkin, M. and Rannala, B. (2000) Ann. Rev. Genomics Hum.Genet. 1, 225-249; Tishkoff, S. A., Varkonyl, R., Cahinhinan, N., Abbes,S., Argyropoulos, G., Destro-Bisol, G., et al. (2001) Science 293,455-462; Sabeti, P.C., Reich, D. E., Higgins, J. M., Levine, H. Z. P.,Richter, D. J., Schaffner, S. F., et al., (2002) Nature, 419, 832-837).Strong LD was found between the 7R-allele and four surrounding DRD4polymorphisms, suggesting this allele is significantly “younger” thanthe common 4R-allele. Our preliminary estimates placed the origin of theDRD4 7R allele at approximately 40,000 years ago, a time of major humanexpansion out of Africa and the appearance of radical new technology(the upper Paleolithic) (Harpending, H. and Rogers, A. (2000) Annu. Rev.Genomics Hum. Genet. 1, 361-385; Ingman, M., Kaessmann, H., Paabo, S.,and Gyllensten, U. (2000) Nature 408, 708-713; Underhill, P. A., Shen,P., Lin, A. A., Jin, L., Passarino, G., Yang, W. H., Kauffman, E.,Bonne-Tamir, B., Bertranpetit, J., Francalacci, P., et al. (2000) NatureGenetics 26, 358-361). We speculate that these events and the appearanceand selection for the DRD4 7R allele may be related.

If the DRD4 7R allele arose recently and underwent strong positiveselection, why is it now disproportionately represented in individualsdiagnosed with ADHD? One possibility is that an adjacent polymorphism instrong LD with the 7R VNTR is actually 1) associated with ADHD, or 2)the target of selection. We have argued that selection for an adjacentsite was unlikely, given the distinct and unusual DNA sequenceorganization of the DRD4 7R allele itself, but due to the high densityof SNPs in the human genome it remained a possibility. Obviously, evenif the DRD4 VNTR is the site of selection, strong LD in the region couldhave carried an adjacent ADHD predisposing polymorphism along with it.Such “hitch-hiking” events should be common, again given the highdensity of SNPs in human DNA (The International SNP Map Working Group.(2001) Nature 409, 928-933).

Alternatively, however, the biochemical and physiological properties ofthe DRD4 protein discussed- above suggest -a more direct relationship.We- have proposed that the 7R/ADHD association is an example consistentwith the Common Variant-Common Disorder hypothesis (Risch, N. andMerikangas, K. (1996) Science 273, 1516-1517), in which commonpredisposing alleles result in deleterious effects only when combinedwith other environmental/genetic factors. We have speculated that thevery traits that may be selected for in individuals with a DRD4 7Rallele may predispose behaviors that are deemed inappropriate in thetypical classroom setting and hence diagnosed as ADHD. In thisenvironmental mismatch hypothesis (Jensen, P.S., Mrazek, D., Knapp, P.K., Steinberg, L., Pfeffer, C., Schowalter, J., et al. (1997) J Am AcadChild Adolesc Psychiatry 36, 1672-1679), the DRD4 7R subset ofindividuals diagnosed with ADHD is assumed to have a different,evolutionarily successful behavioral strategy rather than a disorder. Itis also possible, however, that DRD4 7R, while selected for in humanpopulations, could have deleterious effects only when combined withother genetic variants.

In order to clarify some of these issues, we report an extensiveanalysis of polymorphisms surrounding the DRD4 VNTR by genomicresequencing. Remarkably, we show that the 7R allele exhibits strongworldwide LD, in geographic locations as diverse as sub-Sahara Africaand South American rainforests. By intra-allelic comparison at 18 highfrequency polymorphic sites spanning the locus, we confirm that the 7Rallele arose only 42,500 years ago. Further, the pattern ofrecombination at these sites is that expected for selection acting atthe VNTR itself, rather than at an adjacent polymorphism.

Materials and Methods

Establishing Cell Lines and DNA Purification. Lymphoblastoid cell lineswere established for all individuals. Methods for transformation, cellculture, and DNA purification have been described (above; see also,Ding, Y. C., Chi, H. C., Grady, D. L., Morishima, A., Kidd, J. R., Kidd,K. K., Flodman, P., Spence, M. A., Schuck, S., Swanson, J. M., et al.,(2002) Proc. Natl. Acad. Sci. USA 99, 309-314; Grady, D. L., Chi, H. C.,Ding, Y. C., Smith, M., Wang, E., Schuck, S., Flodman, P., Spence, M.A., Swanson, J. M., and Moyzis, R. K. Mol Psychiatry 8, 536-545; Chang,F.-M., Kidd, J. R., Livak, K. J., Pakstis, A. J., and Kidd, K. K. (1996)Hum. Genet. 98, 91-101). All individuals gave their informed consentbefore their inclusion in this study, which was carried out underprotocols approved by the Human Subjects Committees at the participatinginstitutions. The geographical/ethnic origins of the 103 individualsused in this study, grouped by genotype, are:

4R/4R, 20 African (11 Biaka, 3 Chaga, 3 Mboti, 2 Hausa, 1 AfricanAmerican), 24 European (11 unspecified European, 5 Irish, 3 English, 3German, 1 Greek, 1 Italian), 7 Asian (5 Han Chinese, 2 Japanese);

7R/7R, 6 African (2 Biaka, 2 Hausa, 1 Chaga, 1 African American), 16European (6 unspecified European, 3 European/Hispanic, 2 Irish, 1Italian, 1 Druze, 1 Danish, 1 English, 1 German), 19 Americas (6Karitiana, 5 Ticuna, 4 Maya, 4 Surui), 2 Pacific (Nasioi);

2R/2R, 3 European (2 unspecified European, 1 Russian), 6 Asian (5 HanChinese, 1 Yakut).

Twenty (19%) of these individuals were ADHD Probands (3) (15 European, 4Asian, 1 African), including one 2R/2R, fourteen 4R/4R and five 7R/7Rgenotypes. Primate DNA was obtained from 5 chimpanzee (Pan troglodytes),5 bonobo (Pan paniscus), and 5 western lowland gorilla (Gorilla gorillagorilla) individuals.

PCR amplification and DNA sequencing. The entire DRD4 allelic region wasPCR amplified as three overlapping fragments (totaling 6.3 kb), whichcover positions 140173 to 146480 in GenBank accession number AC 021663.The current Human Genome Project (HGP) assembly contains a 9 kbunordered fragment containing the DRD4 locus (from BAC RP11-49619) butthe terminal DRD4 upstream region of this contig contains 1.9 kb of AluDNA. Forward and reverse primers for these amplifications were140173^(F1) (5′-GTGGTCGCAGACATCTTGG-3′) (SEQ. ID NO. 6), 142075^(R1)(5′-TAGACGAAGAGCGGCAGCA-3′) (SEQ. ID NO. 7, 142057^(F2)(5′-TGCTGCCGCTCTTCGTCTA-3′) (SEQ. ID NO. 8), 145072^(R2)(5′-ATGCTGCTGCTCTACTGGG-3′) (SEQ. ID NO. 9), 144901^(F3)(5′-CCTGCTGTGCTGGACGCCCT-3′) (SEQ. ID NO. 10), and 146480^(R3)(5′-TAGTCGGAGAAGGTGTCCTG-3′) (SEQ. ID. NO. 11). PCR amplification andexcess primer and dNTP removal was as described above (see also, Ding,Y. C., Chi, H. C., Grady, D. L., Morishima, A., Kidd, J. R., Kidd, K.K., Flodman, P., Spence, M. A., Schuck, S., Swanson, J. M., et al.(2002) Proc. Natl. Acad. Sci. USA 99, 309-314; Grady, D. L., Chi, H. C.,Ding, Y. C., Smith, M., Wang, E., Schuck, S., Flodman, P., Spence, M.A., Swanson, J. M., and Moyzis, R. K. Mol Psychiatry 8, 536-545).Additional primer sequences for forward and reverse sequencing of theDRD4 amplification products are available on our web site(www.genome.uci.edu). DNA cycle sequencing on ABI 3100 and 3700automated sequencers was as described above (see also, Ding, Y. C., Chi,H. C., Grady, D. L., Morishima, A., Kidd, J. R., Kidd, K. K., Flodman,P., Spence, M. A., Schuck, S., Swanson, J. M., et al. (2002) Proc. Natl.Acad. Sci. USA 99, 309-314; Grady, D. L., Chi, H. C., Ding, Y. C.,Smith, M., Wang, E., Schuck, S., Flodman, P., Spence, M. A., Swanson, J.M., and Moyzis, R. K. Mol Psychiatry 8, 536-545; Riethman, H. C., Xiang,Z., Paul, S., Morse, E., Hu, X.-L., Flint, J., Chi, H.-C., Grady, D. L.,and Moyzis, R. K. (2001) Nature 409, 948-951).

Analysis of sequence data: Analysis of sequence data was aided by Phred,Phrap, Polyphred and Consed (Nickerson, D. A., To be, V. O., and Tayler,S. L. (1997) Nucleic Acids Res 14, 2745-2751).

Capture of individual genotypes/haplotypes into a database (SNPMAN). Thecollection and editing of SNPs into a relational database is done via anin-house software package we have designated SNPMAN (Grady, D. L., Chi,H. C., Ding, Y. C., Smith, M., Wang, E., Schuck, S., Flodman, P.,Spence, M. A., Swanson, J. M., and Moyzis, R. K. Mol Psychiatry 8,536-545). Visual displays of SNP data were performed using VG (visualgenotyping) (Nickerson, D. A., Taylor, S. L., Weiss, K. M., Clark, A.G., Hutchinson, R. G., Steingard, J., et al. (1998) Nature Genet 19,233-240). Information on all SNPs identified in this study is availableon our web site (www.genome.uci.edu).

Protein modeling. DRD4 protein variants were modeled using thecrystallographic structure of rhodopsin as a template (Filipek, S.,Teller, D.C., Palczewski, K., and Stenkamp, R. (2003) Ann Rev BiophysicsBiomol Structure 32, 375-397).

Allele age calculations. Allele age calculations were conducted bystandard methods. Briefly:

t=[1/ln(1−c)] ln [(x(t)−y)/(1−y)], where t=allele age, c=recombinationrate, x(t)=frequency in generation t, and y=frequency on normalchromosomes. We assumed the origin of the 7R-allele was on a specific 4Rhaplotype, and calculations utilized the extreme values of c determinedfrom the telomeric recombination frequencies (including 11p) obtained byKong et al (Kong, A., Gudbjartsson, D. F., Sainz, J., Jonsdottir, G. M.,Gudjonsson, S. A., Richardsson, B., Sigurdardottir, S., Barnard, J.,Hallbeck, B., Masson, G., et al. (2002) Nature Genet 31, 241-247)(2cM/Mb-4cM/Mb). For example, the T/C polymorphism at position 140,692in the DRD4 consensus sequence is 3889 bp upstream of the VNTR, andhence c values ranging from 0.0000778 to 0.0001556 were used (from theaverage recombination rate per Mb times the VNTR-SNP distance). In allcases, the frequency on normal chromosomes (y) was assumed to be thatobserved on chromosomes obtained from African 4R/4R individuals. Similarresults- were obtained using y obtained from the entire 4R/4R populationsample. The frequency of the derived allelex(t) was that observed in thetotal population of 7R/7R individuals. For example, the T/C polymorphismat position 140,692 has y=5.3% (the percent of the C variant in African4R/4R individuals) and x(t)=84.9% (the percent of the C variant in all7R/7R individuals). For conversion from time t in generations to years,a generation time of 20 years was assumed.

Linkage Disequilibrium. Analysis and display of LD was conducted usingthe GOLD program (Abecasis, G. R., and Cookson, W. O. (2000)Bioinformatics 16, 182-183).

Results

The unusual nature of the sequence architecture of the DRD4 7R VNTR,suggesting it arose as a recent rare mutational event, led us todetermine if differences in LD exist between the 4R and 7R alleles. Weresequenced 6,307 bp of contiguous DNA surrounding the DRD4 VNTR from103 individuals (1.5 Mb total), chosen from previous screenings ashomozygous for the VNTR (4R/4R, 7R/7R, 2R/2R; FIG. 7). DRD4 loci from 514R/4R individuals, 43 7R/7R individuals and 9 2R/2R individuals wereresequenced. 7R/7R and 2R/2R individuals were highly oversampled incomparison to their frequency in the population. This approach wasdeveloped as a direct and efficient method to estimate the haplotypediversity surrounding the putative ancestral 4R allele in comparison tothe recent 7R allele. The resulting sequence data was processed bySNPMAN and polymorphisms displayed using VG.

FIG. 7 displays the polymorphism distribution of individuals grouped bygenotype (4R/4R, 7R/7R, and 2R/2R) and geographic origin (African,European, etc.). Individuals were intentionally chosen from diversepopulations. For example, the African samples represent 13 Biaka, 4Chaga, 4 Hausa, 3 Mboti and 2 African American individuals (seeMethods). Due to the wide variation in 7R allele distribution, oursample includes an abundance of 7R/7R individuals of North and SouthAmerican ancestry and none from Asia, where the 7R allele frequency isonly 0.01 (FIG. 7). Our 4R/4R sample intentionally included a largefraction (39%) of individuals of African ancestry (FIG. 7), in order toestimate the “ancestral” frequency of polymorphisms (see below).

Not including VNTR variants, a total of 70 SNPs/polymorphisms weredetected (on average, one per 90 bp), many at low frequency (FIG. 7). Asexpected, most of these low abundance SNPs were not in currentdatabases. As can be seen in FIG. 7, the polymorphism spectraldistribution of the 4R/4R homozygotes exhibits little LD over the regionexamined. In addition, twenty-eight percent (20/70) of the observed SNPswere found only in African samples (FIG. 7). These results areconsistent with many studies on other genomic regions, and likelyreflect the “fingerprint” of an out of Africa expansion of modern humansand a genetic bottleneck in European and Asian populations. FIG. 8 showsa graphical display of LD for the same 4R/4R data, using the GOLDprogram. GOLD displays all pairwise LD values as a color gradientaligned with the linear DNA sequence. As can be seen in FIG. 8, there islittle LD above the 0.6 value expected at these close (<6 kb) distances.

In contrast, the evidence for strong LD surrounding the 7R allele isdramatic, with most 7R/7R individuals exhibiting the same polymorphismsat most sites (FIGS. 7 and 8). All 7R alleles, including those fromAfrican populations, exhibit the same strong LD. By resequencing thissame genomic region in 15 primate genomes (Chimpanzee, Bonobo, andwestern lowland Gorilla), the likely ancestral SNP could be determinedunambiguously for most SNP pairs. Seventy-six percent (13/17) of themost common variants (Table 5) were inferred to be ancestral in origin,with one SNP (144,842) having both variants in primate DNAs. Four of themost common variants in the population were “human specific” (Table 5).Forty-one percent (7/17) of the observed polymorphisms in tight LD withthe 7R VNTR were the rarer human specific SNPs (FIG. 7 and Table 5).

Table 5. Calculated allele age for DRD4 7R. Eighteen polymorphisms inthe DRD4 sequence are arranged in upstream to downstream order, anddistance to the exon 3 VNTR is indicated. The most frequent polymorphismis listed first, in all cases the “ancestral” variant determined fromprimate DRD4 resequencing, except for the four noted with an asterisk inwhich the less common variant is ancestral. The frequency of the commonpolymorphism in African 4R/4R individuals and all 7R/7R individuals isgiven. All values were obtained from the data displayed in FIG. 7,except for polymorphisms 140,438, 144,842 and 144,862 which wereobtained from a much larger sample set (over 2,000 individuals).Asterisks indicate “human specific” SNPs tightly linked to the 7Rallele. Allele age was calculated from the extreme values of telomericrecombination reported in Kong et al. (Age1=4cM/Mb and Age2=2cM Mb)using standard methods. The average value obtained from allpolymorphisms is 42,500 years (average Age1 and Age2), with maximumlikely limits of 20,000-65,000 years. Polmorphism Distance 4R/4R 7R/7RAge1 Age2 140,438 L/S(120 bp)* 4143(−) 61.9% 90.8%* 33,361 66,715140,582 G/del 3999(−) 95.0% 13.9%* 19,175 39,557 140,692 T/C 3889(−)94.7% 15.1%* 22,269 44,653 140,892 T/C 3689(−) 70.0% 95.4% 22,558 45,119141,507 C/T 3074(−) 73.7% 97.7% 14,884 29,771 142,426 G/A 2155(−) 90.0%97.7% 28,961 57,922 143,578 A/del 1003(−) 78.4% 98.80% 28,493 56,989143,766 C/A*  815(−) 77.5% 4.6% 37,539 75,078 143,862 G/del  719(−)95.0% 2.3%* 17,037 34,075 VNTR 0 144,842 G/A 261 88.7% 1.6% 34,86569,731 144,862 G/C 281 87.9% 1.6%* 32,686 65,373 145,239 del/G* 65843.8% 2.3% 31,637 63,221 145,295 T/C 714 79.0% 2.3%* 20,690 41,381145,353 del/G 772 41.7% 2.3%* 27,615 55,181 145,684 A/C* 1103 69.4% 3.5%23,457 46,915 146,041 T/C 1460 42.1% 3.8% 26,699 53,397 146,056 T/C 147557.9% 96.3% 26,989 53,966 146,293 C/A 1712 44.7% 4.6% 28,187 56,367Average 26,505 53,078 SD = 6,112 SD = 12,139

Only a single new high abundance SNP was found in the 7R allelesexamined, located in the downstream region of the gene (146,033;asterisk in FIG. 7). The majority of individuals containing this SNPwere of North or South American ancestry (Karitiana, Ticuna, Maya, andSurui), suggesting a possible New World origin. This SNP was not foundin our African population samples, and was at low frequency in ourEuropean populations, which included some individuals with partialHispanic (and likely North/South American) ancestry (FIG. 7).

One exception to the strong LD found at the DRD4 7R locus is in a small288 bp region at the promoter (−809 to −521 in FIG. 7), where 8 tightlyspaced SNPs are found at comparable frequencies in both 4R and 7Ralleles. Five of these SNPs are clustered in a region of only 95 bp. Itis difficult to understand this specific breakdown of LD at the promoterunless numerous gene conversion events/mutations/selections haveoccurred (Ardlie, K., Liu-Cordero, S. N., Eberle, M. A., Daly, M.,Barrett, J., Winchester, E., Lander, E. S., and Kruglyak, L. (2001) Am JHum Genet 69, 582-589). Similar high frequency variations are found inthis region in the limited primate samples examined, including moreextensive deletions in chimp and gorilla (data not shown). These highlyvariable SNPs (140,989-141,277) are not included in Table 5, theirancestral origin (above) cannot be determined, and they cannot be usedin allele age determinations (below). Regardless of the mechanism ofhomogenization at this small DRD4 promoter region, the strong LDobserved at the 7R allele continues upstream of this region, with 4 highfrequency SNPs/polymorphisms in tight linkage with the VNTR (FIG. 7 andTable 5). Genotyping of other VNTRs associated with three DRD4 adjacentloci (PTDSS2, HRAS, and SCT) indicates that the region of strong LDsurrounding the DRD4 7R allele extends for at least 50-100 kb (data notshown).

Interestingly, from the resequencing of a sample of 2R/2R homozygotes(FIG. 7), the 2R allele appears to be a recombination product between a4R and a 7R allele, as we originally proposed. The 2R VNTR downstreamregion contains a polymorphism pattern identical to that found in 7Ralleles, while the VNTR upstream region is more variable, suggestingmore than a single origin for this proposed recombination (FIG. 7). Mostof the examined 2R alleles, however, contain a unique SNP (142,115;asterisk in FIG. 7) in the first intron, found only in a single 4Rindividual of African ancestry, suggesting a common origin and expansionfor these 2R alleles.

Calculations of allele age based on the relatively high worldwidepopulation frequency of the DRD4 2R, 4R and 7R alleles suggest thatthese alleles are ancient (>300,000-500,000 years old). On the otherhand, calculations of allele age based on the observed intra-allelicvariability (Table 5) suggest the 7R allele is 10 fold “younger” (42,500years). Such-large discrepancies between allele ages calculated by thesetwo methods are usually taken as evidence that strong selection hasincreased the frequency of the allele to higher levels than expected byrandom genetic drift. The absolute values of these estimates are greatlyaffected by the assumptions used in their computations, for example theassumed recombination frequency. For the calculations in Table 5, wehave used the extremes of estimates of recombination frequency observedfor the telomeric regions of human chromosomes (including 11p). All 18high frequency DRD4 SNPs used to estimate allele age yield comparableresults (Table 5). This suggests that the average of these values(42,500 years) is a reasonable current estimate of the allele age forDRD4 7R, comparable to our prior estimate of 40,000 years based on onlyfour adjacent SNPs (2). Using the extremes of assumed recombinationfrequency (Age1 and Age2), plus or minus the standard deviation, yieldsan estimate of the limits of allele age from 20,000-65,000 years (Table5). The proposed origin of the 2R allele as a 7R allele derivative (FIG.7) indicates that it must also be a young allele. The discrepancybetween the observed high frequency of the 2R allele, especially inAsian populations, and it's likely recent origin (FIG. 7) suggests thatit too has likely increased in frequency by positive selection.

The data in FIG. 7 and Table 5 can also be used to test if the DRD4 VNTRitself, rather than an adjacent SNP, is the target of selection.Ideally, one should observe an increase in recombination (and lower LD)as distance from the selected polymorphism is increased. FIG. 9 plotsdistance from the DRD4 7R VNTR versus percent recombination. As expectedif the DRD4 7R VNTR is the target of selection, the observedrecombination is lowest near the VNTR, and increases with distance inboth directions. Groupings based on splitting the population sampledbased on any of the other 18 SNPs (for example splitting the samplebased on the G/A 142,426 SNP rather than the 4R/7R VNTR) yielded largelyrandom recombination patterns for adjacent SNPs (data not shown). Whilethe observed recombination fraction is quite low, and there issignificant scatter in the data (FIG. 9) these results support thehypothesis that the DRD4 7R VNTR is the target of selection.

Discussion

In this study, we have expanded our LD analysis of the DRD4 locus byresequencing the entire locus in 2R, 4R, and 7R homozygous individuals.This method was chosen as an accurate and efficient approach todetermine the comparative LD of two alleles, requiring littlestatistical manipulation to infer haplotype differences. Using thisapproach (FIGS. 7 and 8), the pattern of LD surrounding the DRD4 4Rallele is that expected for an ancient gene locus (300,000-500,000 yearsold), in which haplotype diversity is greatest in African populations,and more restricted outside Africa.

In contrast, the evidence for strong LD surrounding the 7R allele isdramatic (FIGS. 7 and 8). Such worldwide LD for a single selected humanallele is remarkable. For example, in one of the best-characterizedexamples of selection in humans, the frequencies of low-activity allelesof glucose-6-phosphate dehydrogenase are highly correlated with theprevalence of malaria, yet many regional variants have been selectedfor. There is no worldwide “malaria resistant” variant, presumablybecause the introduction of agriculture 10,000 years ago (and thePlasmodium parasite) selected for independent regional mutations. Byintra-allelic comparison at 18 high frequency polymorphic sites, we canestimate that the DRD4 7R allele arose approximately 42,500 years ago(with maximum likely limits of 20,000-65,000 years ago; Table 5).Further, the finding that forty-one percent of the 7R adjacent SNPs intight LD are “human specific” (Table 5) argues for the derivation ofthis variant by mutation from the common human 4R allele, rather thanimportation from a related hominid lineage. Population bottlenecks andlocal admixture cannot explain the observed results. We propose,therefore, that the worldwide LD found for the DRD4 7R allele is areflection of strong selection for this allele at the time of the majorout of Africa exodus.

We suggested it is unlikely that selection for an adjacent gene canaccount for the proposed selection, given the distinct and unusual DNAsequence of the DRD4 7R VNTR itself. We have now shown thatrecombination frequency with adjacent SNPs is likely centered on theVNTR in 7R alleles (FIG. 9), suggesting that it is indeed the target ofselection. Strong LD with the DRD4 7R allele can be detected at least50-100 kb from the VNTR (near the PTDSS2, HRAS and SCT loci, data notshown). However, since the current HGP assembly in this subtelomericregion contains many gaps and ambiguous contig orders, it is impossibleat present to refine these LD studies. Further work to define the limitsof LD for this locus will help clarify both the estimates of allele ageand the evidence for VNTR selection.

The breakdown of this strong LD at a small (288 bp) region at thepromoter of the 7R allele is surprising (FIG. 7), and suggests thatfrequent gene conversion events/mutations/selections have occurred atthis region. One can only speculate as to what mechanisms might beinvolved. It is especially intriguing, however, that this homogenizationoccurs at the promoter. Given that the CpG frequency at this site is notsignificantly higher than the remainder of this GC-rich gene, we suggestthat high frequency gene conversion might explain this homogenization.Similar high frequency variations are found in primates, thus thisregion is a hotspot for such changes. Small hotspots for gene conversionhave been proposed to exist at various loci in the human genome. Theoverall strong LD associated with the 7R allele continues upstream ofthis anomalous region (FIG. 7). Such data suggest using caution ininferring LD surrounding a particular genomic region based on a limitednumber of markers.

Extensive biochemical analyses of DRD4 protein variants have beenconducted (5-7). The 7R protein has a blunted response for cAMPreduction, requiring a three-fold increase in dopamine concentration forreductions comparable to the 4R protein. This “suboptimal” response ofthe 7R allele to dopamine was hypothesized to underlie its associationwith the personality trait of novelty seeking and ADHD. It was suggestedthat the inhibitory neurons utilizing the DRD4 7R receptor would requireincreased dopamine for “normal” function (Swanson, J. M., Oosterlaan,J., Murias, M., Schuck, S., Flodman, P., Spence, M. A., Wasdell, M.,Ding, Y. C., Chi, H. C., Smith, M., et al. (2000) Proc. Natl. Acad. Sci.USA 97, 4754-4759). Such increased dopamine levels were hypothesized tobe provided by risk taking behavior (in the case of novelty seeking) ormethylphenidate (in the case of ADHD). Methylphenidate is thought to actby binding to the dopamine transporter and raising the levels ofdopamine at the synapse.

We propose a simple model integrating the known biochemical,physiological and genetic data regarding the common DRD4 alleles (FIG.10). The 4R allele appears to be the dominant allele throughout most ofhuman prehistory. This ancestral allele has the fewest amino acidchanging variants, implying strong purifying selection. The 7R allelearose as a rare mutation approximately 42,500 years ago thatsignificantly blunted the receptor's response to dopamine. This bluntedresponse led to behaviors that were selected for in certainenvironments, and the two alleles (4R and 7R) coexisted in a balancedselection system, their relative frequencies varying by both chance andthe environmental/cultural conditions. For example, it has beensuggested that resourcedepleted, time-critical, or rapidly changingenvironments might select for individuals with “response ready”adaptations, while resource-rich, time-optimal, or little changingenvironments might select against such adaptations. We have speculatedthat such a “response ready” adaptation might have played a role in theout of Africa exodus, and that allele frequencies of genes associatedwith such behavior would certainly be influenced by the local culturalmilieu (above; see also, Harpending, H. and Cochran, G. (2002) Proc.Natl. Acad. Sci. USA 99). Consistent with this “response ready” behaviorhypothesis is the significantly better performance of DRD4 knockout miceon tests of complex coordination, and the observed faster reaction timesexhibited by ADHD individuals with a 7R allele in comparison to non-7Rindividuals (Langley, K., Marshall, L., van den Bree, M., Thomas, H.,Dphil, O., O'Donovan, M., and Thapar, A. (2003) Amer J Psychiatry, inpress).

The genetic data suggest that most 2R alleles are 7R derivatives, andlikely had limited (yet multiple) origins (FIG. 10). Interestingly, the2R variant also has a blunted cAMP response, but one midway between the4R allele and 7R allele (FIG. 10). Perhaps individuals with 2R allelesexhibit behaviors “intermediate” between those manifested by 4R and 7Ralleles? This “non-linear” response (i.e., cAMP reduction capability isnot linearly related to DRD4 VNTR repeat length) is consistent with thegenetic evidence, and suggests a typical biochemical “optimum” strategy(FIG. 10). In this model, the 4R variant has been honed for hundreds ofthousands of years to function optimally, while the new 7R and 2Rvariants are suboptimal yet confer a behavioral advantage in someenvironments. We propose that all three alleles are maintained in thepopulation by balanced selection, their relative frequencies dependenton both chance and local selective pressures.

Such frequency dependent adaptive strategies are common, and arepredicted by evolutionary game theory (Smith, J. M. Evolution and theTheory of Games. (1982) Cambridge: Cambridge University Press). A nowclassic example is the “rock-paper-scissors” color morphs in theside-blotched lizard (Sinervo, B. and Lively, C. M. (1996) Nature 380,240-243). In this species, color is controlled by a single locus (OBY)that serves as a genetic marker for three different male behavioralstrategies. Orange males usurp territory, blue males are mate guards,and yellow males are sneakers. Sneakers beat aggressive usurpers,mate-guards beat sneakers, and usurpers beat mateguards. Malecompetition drives cycles analogous to a rock-paper-scissors game, withall three strategies successfully reproducing (at varying frequencies)in the population.

While this speculative model (FIG. 10) is based on available genetic,biochemical, and physiological data, obviously only further work cantest, refine and modify these ideas. The evidence for selection actingat the DRD4 locus is strong, however (FIGS. 7 and 9), and challenges usto determine the specific mechanism driving it. Regardless of theultimate details, is it reasonable to think that a single gene variationcan modify human behavior and be shaped by cultural diversity? We arguethat just such single gene changes regulating complex social behaviorhave been identified in other organisms (Krieger, M. J. B. and Ross, K.G. (2002) Science 295, 328-332). We see no reason to think humans shouldbe exempt from similar Darwinian selection (Darwin, C. The Descent ofMan and Selection in Relation to Sex. (1871) London: J. Murry.), andsuggest the exciting possibility that the DRD4 locus is a primecandidate for such gene-culture interactions.

Diagnostic Test for ADHD using DRD4 Probes.

The invention provides a method for testing patients for ADHD usingprobes derived from the DRD4 7R allele, or markers from within an areaof strong linkage disequilibrium with the DRD4 7R allele. The inventionprovides a DNA oligomer comprising a DNA sequence complementary to DNAencoding the DRD4 7R allele, or markers from within an area of stronglinkage disequilibrium with the DRD4 7R allele. Under appropriatehybridization conditions, such probes may be used to screen samples fromindividuals for the presence of the DRD4 7R allele, or a marker fromwithin an area of strong linkage disequilibrium with the DRD4 7R allele.It should be readily apparent to those skilled in the art that asequence complementary to the anti-sense strand of the DRD4 7R allele isalso provided by the subject invention.

The DNA oligomer can be labeled with a detectable marker, such as aradiolabeled molecule, a fluorescent molecule, an enzyme, a ligand, orbiotin. The labeled oligomer can then be utilized to detect the presenceof the DRD4 7R allele, or a marker from withing an area of stronglinkage disequilibrium with the DRD4 7R allele, so as to diagnose ADHD.This method comprises:

a) obtaining a tissue sample from the subject;

b) treating the sample so as to expose DNA present in the sample;

c) contacting the exposed DNA with the labeled DNA oligomer underconditions permitting hybridization of the DNA oligomer to any DNAcomplementary to the DNA oligomer present in the sample, the DNAcomplementary to the DNA oligomer containing the DRD4 7R allele or othermarker within the region of strong linkage disequilibrium;

d) removing unhybridized, labeled DNA oligomer; and

e) detecting the presence of any hybrid of the labeled DNA oligomer andDNA complementary to the DNA oligomer present in the sample, therebydetecting the allele or other marker and diagnosing ADHD.

Alternatively, DNA isolated from samples taken from individuals can beamplified by PCR using primers directed to the DRD4 gene or othermarkers within the area of strong linkage disequilibrium, and sequencedto determine the presence of specific DRD4 alleles as described above.

All methods for detecting the presence of a specific DNA sequence in DNAisolated from an individual known to one of skill in the art arecontemplated to fall within the scope of this invention. Thus, anymethod of detecting the presence of the DRD4 7R allele, or other markerin the region of linkage disequilibrium, is within the scope of thisinvention.

While this invention has been described in detail with reference to acertain preferred embodiments, it should be appreciated that the presentinvention is not limited to those precise embodiments. For example, thereagents and methods of the present invention include not just thosespecifically disclosed, such as specific identified alleles associatedwith ADHD, but also to any markers subsequently found by routineexperimentation to fall with the area of strong linkage disequilibriumwith the DRD4 alleles identified above. Rather, in view of the presentdisclosure which describes the current best mode for practicing theinvention, many modifications and variations would present themselves tothose of skill in the art without departing from the scope and spirit ofthis invention. The scope of the invention is, therefore, indicated bythe following claims rather than by the foregoing description. Allchanges, modifications, and variations coming within the meaning andrange of equivalency of the claims are to be considered within theirscope.

1. Reagent useful for diagnosing attention deficit hyperactivitydisorder (ADHD), comprising a polynucleotide corresponding to an alleleof DRDR associated with individuals exhibiting ADHD.
 2. The reagent ofclaim 1, wherein the polynucleotide corresponds to the DRD4 7R allele.3. Reagent useful for diagnosing ADHD, comprising a polynucleotidecorresponding to a marker the locus of which is within a block oflinkage disequilibrium surrounding the DRD4 7R allele.
 4. The reagent ofclaim 3, wherein the locus of the marker is within 100 kB of the DRD4 7Rallele.
 5. The reagent of claim 3, wherein the locus of the marker iswithin 50 kB of the DRD4 7R allele.
 6. Reagent useful for diagnosingADHD, comprising a pair of oligonucleotides corresponding to an alleleof DRDR associated with individuals exhibiting ADHD.
 7. The reagent ofclaim 6, wherein the pair of oligonucleotides corresponds to the DRD4 7Rallele.
 8. Reagent useful for diagnosing ADHD, comprising a pair ofoligonucleotides corresponding to a marker the locus of which is withina block of linkage disequilibrium surrounding the DRD4 7R allele.
 9. Thereagent of claim 8, wherein the locus of the marker is within 100 kB ofthe DRD4 7R allele.
 10. The reagent of claim 8, wherein the locus of themarker is within 50 kB of the DRD4 7R allele.
 11. A method fordiagnosing ADHD in an individual, comprising the steps of: a) obtaininga tissue sample from the individual; b) treating the sample so as toexpose DNA present in the sample; c) contacting the exposed DNA with alabeled DNA oligomer under conditions permitting hybridization of theDNA oligomer to any DNA complementary to the DNA oligomer present in thesample, the DNA complementary to the DNA oligomer containing the DRD4 7Rallele; d) removing unhybridized, labeled DNA oligomer; and e) detectingthe presence of any hybrid of the labeled DNA oligomer and DNAcomplementary to the DNA oligomer present in the sample, therebydetecting and diagnosing ADHD.
 12. A method for diagnosing ADHD in anindividual, comprising the steps of: a) obtaining a tissue sample fromthe individual; b) treating the sample so as to expose DNA present inthe sample; c) contacting the exposed DNA with a labeled DNA oligomerunder conditions permitting hybridization of the DNA oligomer to any DNAcomplementary to the DNA oligomer present in the sample, the DNAcomplementary to the DNA oligomer containing a marker within a region ofstrong linkage disequilibrium to the DRD4 7R allele; d) removingunhybridized, labeled DNA oligomer; and e) detecting the presence of anyhybrid of the labeled DNA oligomer and DNA complementary to the DNAoligomer present in the sample, thereby detecting and diagnosing ADHD.13. A method for diagnosing ADHD in an individual, comprising the stepsof: a) obtaining a tissue sample from the individual; b) providing anoligonucleotide complementary to the sense strand of the DRD4 gene; c)providing an oligonucleotide complementary to the antisense strand ofthe DRD4 gene; d) treating the sample so as to expose DNA present in thesample; e) contacting the exposed DNA with the oligonucleotides underconditions permitting amplification of the DRD4 gene; f) sequencing theproduct of the amplification; and g) detecting the presence of the DRD47R allele in the sample, thereby detecting and diagnosing ADHD.
 14. Amethod for diagnosing ADHD in an individual, comprising the steps of: a)obtaining a tissue sample from the individual; b) providing anoligonucleotide complementary to the sense strand of a marker sequencefound in an area of strong linkage disequilibrium with the DRD4 7Rallele; c) providing an oligonucleotide complementary to the antisensestrand of the marker sequence; d) treating the sample so as to exposeDNA present in the sample; e) contacting the exposed DNA with theoligonucleotides under conditions permitting amplification of the markersequence; f) sequencing the product of the amplification; and g)detecting the presence of the marker sequence in the sample, therebydetecting and diagnosing ADHD.